SciML / PubChemReactions.jl

Generation of reaction networks from PubChem data
5 stars 1 forks source link

roadmap, random thoughts #11

Open anandijain opened 2 years ago

anandijain commented 2 years ago

The problem is, currently the amount of "chemistry" that you can do with SciML/Catalyst/Symbolics is quite limited.

@utkarsh530, I recall you were looking for a reaction rate coefficients database (right?). I think the better way to do this is to actually keep track of each Atom something like:

Look at src/particle.jl for my scratch work

struct Atom protons neutrons electrons end

with the eventual goal being that we can use the formulas in https://en.wikipedia.org/wiki/Reaction_rate_constant and lower the ReactionSystem to a PDE (with temperature as the 2nd IV). I think for now, it is best to stick to the Arrhenius equation for calculating k(T) and avoid having to do statistical mechanics.

This will allow us to be able to make a Catalyst.Reaction constructor that doesn't require putting in a Rate (as it will be auto-determined by Arrhenius + mass action law).

We will need to know how to calculate (or lookup) bond enthalpies in order to calculate the activation energy required for the reaction (which determines the temperature dependence for the reaction rate constant).

One issue that I've run into is deciding the right level of abstraction. It should be obvious we do not want to go as far as to represent protons via quarks, but for anybody doing nuclear chemistry, we would need to know how to calculate the energy released from radioactive decay and other nuclear processes.

For now I think a good starting place is just sticking to the electromagnetic forces (and leaving nuclear things for the future). So we can avoid needing to know which isotope each atom is, but we will still need to know the valence of each atom.

The reason being we need to know which bonds are broken/formed to calculate net energy. The issue here is that PubChem compounds does not provide valence charges (I think). They do provide the compound charge and bond order, but I don't think there is unique mapping to determine valences.

https://github.com/SciML/Catalyst.jl/issues/424 it would be cool to be able to lower rxnsys -> ODE/PDE/Jump in a way that ensures a term for energy that allows conservation of energy.

TLDR; I want to teach a bit of chemistry to Catalyst. this will give users much more peace of mind that the model from literature and the julia code are on the same page.

I'm also not entirely certain about everything here (since it's been a while since I've done any chemistry). Feedback much appreciated

Roadmap and scope

There are a number of things that Catalyst does not support natively that are useful to those modeling chemical reactions.

Particularly, dealing with compounds, elements, and elementary particles is (currently) out of scope for Catalyst.

However, there are many cases when incorporating this extra data (atom-bond graphs, valence charges, masses, etc) is useful for ensuring model correctness and enforcing constraints like conservation of energy and checking that reactions are balanced.

for example, the following reaction in catalyst, there is no way for it to determine that the reaction is unbalanced.

Elementary reactions that I'd want to be able to compute/simulate with a symbolic chemistry tool: (least difficult to most)

https://en.wikipedia.org/wiki/Chemical_reaction

# autobalancing, and conservation of energy 
# i'd want to be able to calculate the net enthalpy 
# this isn't too difficult with what I have now,
# if we have a lookup table for bond enthalpies 

#  (the issue here is that I don't keep track of bond type, nor the number of bonds between two atoms)
# currently it is a SimpleGraph, but i dont know if it makes sense to try to use a multigraph, even then, for ionic bonds, what to do?
reaction"CH4 + O2 -> H2O + CO2"

# https://en.wikipedia.org/wiki/Deuterium%E2%80%93tritium_fusion
# the issue with this one is that right now, the @reaction_str macro requires that each of the species is a pubchem Compound.
# however there aren't compounds for things like individual electrons, neutrons, protons
# again here I want to be able to know that 17.6 MeV of energy is released
reaction"H2 + H3 -> He + Neutron"

# phase changes 
H2O (liquid) (25 C) -> H2O (gas) (150 C)

# PDE with temperature change caused by a reaction, with the eventual goal of modeling a protein being denatured by this temperature change
# specifying the compartment and the boundary conditions
# lets say its a closed system and we have the initial temperature,
# we specify so and so moles react and we specify (or autodetermine) in the reaction rate the dependence of the protein on T (arrhenius)
# we end up finding that early in the reaction, the protein was being consumed but as the compartment solution heated up, the rate goes down 
reactions"""
NaOH (aq) + HCl (aq) → NaCl (aq) + H2O (l) 
protein + nacl -> ...
"""

there are a number of decisions to make, like whether energy is implicit or explicit, how much should try to be computed vs just doing a lookup etc.

@utkarsh530 look over my latest commit and mess around wiht stuff, see what you can do

utkarsh530 commented 2 years ago

Hi Anand,

I think these ideas are quite close to what we might be doing with this repo, and in fact they are brilliant. Let me try to address things one-by-one:

  1. Enthalpies for rate constant: I think your idea is quite nice to use bond enthalpies etc to calculate activation energy, but I am not sure how much that precise it would be. I think the error would vary reaction to reaction. I think that was the purpose of the large handbooks and tables which had verified values. I am ready to give a second thought for automatic calculation, but I believe that researchers would know these values in advance. We do need to figure out which rate law we need to use (which can also be supplied by the user).
  2. Atomic and Nuclear balance: I believe this feature can be implemented, I believe at least for simpler-medium hard use cases. I believe there are pre-existing algorithms to check balance in an easier way extending your current implementation to energies and stuff

We can start implemented your abstractions of Atom. We might not able to fully use it currently and it will take quite some time to reach your current vision with this repo.

Apologies for my not-so-prompt replies, I am currently balancing grad-school applications and hiring process with JC work.

utkarsh530 commented 2 years ago

I will try to finalise #10 by this week

anandijain commented 2 years ago

Awesome, thanks. Since there is a lot to be done, I wouldn't try to do it in a week. One thing that I found that might be helpful is https://github.com/JuliaPhysics/PeriodicTable.jl There is an issue about supporting isotopes.

I'm really hoping that we can make a nice interface to make public and show to https://github.com/isaacsas (main Catalyst dev).

He will probably have some very good ideas on how to incorporate more chemistry into the simulators. How much we decide to open source is more up to Chris/Viral, but I don't have a problem open sourcing most (if not all) of this repo.

But you will find that the code I've written in that PR is pretty god-awful (seriously its very bad). You will probably do much better to make a new PR and just copy bits and pieces of it.

anandijain commented 2 years ago

another thing that would be cool to be able to simulate is reactions involving photons, like this pathway.

https://en.wikipedia.org/wiki/Photoreceptor_cell. also photosynthesis