Discovery of chemical reactions-pathways using neural networks

yewalenikhil65 commented 3 years ago

Not sure if this is the right place to post, but posting here since Catalyst seems only place related to Chemical reactions. I recently across paper deriving unknown chemical reaction pathways using neural-net

https://arxiv.org/abs/2002.09062

Paper discusses physically interpretable chemical reaction neural network (CRNN) based on law of mass action as aA + bB --> cC +dD whose rate of reaction can be expressed using law of mass action as R = exp(ln(k) + a ln[A] + b ln [B] + c ln [C] + d[D])

So one can imagine activation function as exponential function , weights of input layers as stoichiometric coefficients of reactants , inputs as ln ([species]) and bias as ln[rate constant]

Although not a generalized neural net frame for ODEs , this could really be good addition in Sci_ML/Catalyst related to Chemical reactions.

I am currently trying to implement this using Julia

Minor query that arises while going through this paper, is that the weights of input layer are required to be positive (as stoichiometric coefficients of reactants are positive)

I couldn't find out an appropriate way to do using flux. Let me know if anyone finds this interesting

ChrisRackauckas commented 3 years ago

Might as well ping @jiweiqi. You could do this quite straightforwardly in Flux. It would be good to implement it.

jiweiqi commented 3 years ago

Hi @yewalenikhil65, sorry for my late to your e-mail. I am happy that you try to find solutions using Julia. I didn't know that you also use Julia. I do have a Julia implementation of the Case I of the arXiv paper, as shown below. Although I have made some improvements later, this will demonstrate the idea, and we can discuss the improvements.

# Reference: Ji, Weiqi, and Sili Deng. "Autonomous Discovery of Unknown Reaction Pathways from Data by Chemical Reaction Neural Network." arXiv preprint arXiv:2002.09062 (2020).
using DiffEqFlux, OrdinaryDiffEq, Flux, Optim, Plots
using Flux: gradient
using Flux.Optimise: update!
using Random

n_plot = 10
n_epoch = 100000
n_exp = 30

function trueODEfunc(dydt, y, k, t)
    dydt[1] = -2 * k[1] * y[1]^2 - k[2] * y[1]
    dydt[2] = k[1] * y[1]^2 - k[4] * y[2] * y[4]
    dydt[3] = k[2] * y[1] - k[3] * y[3]
    dydt[4] = k[3] * y[3] - k[4] * y[2] * y[4]
    dydt[5] = k[4] * y[2] * y[4]
end

ns = 5
nr = 4
u0_list = rand(Float32, (n_exp, ns))
u0_list[:, 3:ns] .= 0

datasize = 20
tspan = Float32[0.0, 20.0]
tsteps = range(tspan[1], tspan[2], length = datasize)
k = Float32[0.1, 0.2, 0.13, 0.3]
# Tsit5() seems to be more effcient and robust here
alg = Tsit5()  #Rosenbrock23(autodiff = false)

ode_data_list = []

for i in 1:n_exp
    u0 = u0_list[i, :]
    prob_trueode = ODEProblem(trueODEfunc, u0, tspan, k)
    ode_data = Array(solve(prob_trueode, alg, saveat = tsteps))
    push!(ode_data_list, ode_data)
end

lb = 1e-5
ub = 10.0

dudt2 = FastChain((x, p)->log.(clamp.(x, lb, ub)),
                  FastDense(5, 4, exp),
                  FastDense(4, 5))

prob_neuralode = NeuralODE(dudt2, tspan, alg, saveat = tsteps)
p = prob_neuralode.p

function clip_p(p)
    # layer1: 5x4+4 = 24
    # layer2: 4x5+5 = 25
    p[1:20] .= clamp.(p[1:20], 0, 2.5)
    p[45:49] .= 0

    return p
end

# For debuging code, manually set the NN weights
# function set_p(p)
#     p = zeros(49)
#     p[1:4] = [2,1,0,0]
#     p[5:8] = [0,0,0,1]
#     p[9:12] = [0,0,1,0]
#     p[13:16] = [0,0,0,1]
#     p[17:20] = [0,0,0,0]

#     p[21:24] .= log.([0.1, 0.2, 0.13, 0.3])

#     p[25:29] .= [-2,  1,  0,  0, 0]
#     p[30:34] .= [-1,  0,  1,  0, 0]
#     p[35:39] .= [ 0,  0, -1,  1, 0]
#     p[40:44] .= [ 0, -1,  0, -1, 1]
#     return p
# end

function predict_neuralode(u0, p)
    pred = clamp.(Array(prob_neuralode(u0, p)), -ub, ub)
    return pred
end

function loss_neuralode(p, i_exp)
    u0 = u0_list[i_exp, :]
    ode_data = ode_data_list[i_exp]
    pred = predict_neuralode(u0, p)
    loss = sum(abs2, ode_data .- pred)
    return loss
end

function loss_pred_neuralode(p, i_exp)
    u0 = u0_list[i_exp, :]
    ode_data = ode_data_list[i_exp]
    pred = predict_neuralode(u0, p)
    loss = sum(abs2, ode_data .- pred)
    return loss, pred
end

function display_p(p)
    println("s1")
    println(p[1:4])
    println("s2")
    println(p[5:8])
    println("s3")
    println(p[9:12])
    println("s4")
    println(p[13:16])
    println("s5")
    println(p[17:20])
end

# Callback function to observe training

cbi = function (p, i_exp)

    ode_data = ode_data_list[i_exp]
    pred = predict_neuralode(u0_list[i_exp, :], p)
    list_plt = []
    for i in 1:ns
        plt = scatter(tsteps, ode_data[i,:], title = string(i), label = string("data_",i))
        plot!(plt, tsteps, pred[i,:], label = string("pred_",i))
        push!(list_plt, plt)
    end
    plt_all = plot(list_plt..., legend = false)
    png(plt_all, string("figs/i_exp_", i_exp))
    return false
end

list_loss = []
iter = 0
cb = function (p, loss_mean)
    global list_loss, iter
    push!(list_loss, loss_mean)
    println(string("\n", "iter = ", iter, " loss = ", loss_mean, "\n"))

    if iter % n_plot == 0
        display_p(p)

        for i_exp in [1, 10, 20, 30]
            cbi(p, i_exp)
        end

        if iter < 2000
            plt_loss = plot(list_loss, yscale = :log10, label="loss")
        else
            plt_loss = plot(list_loss, xscale = :log10, yscale = :log10, label="loss")
        end
        png(plt_loss, "figs/loss")
    end

    iter += 1

end

opt = ADAM(0.001)

p = clip_p(p)

loss_all = zeros(Float32, n_exp)

for epoch in 1:n_epoch
    global p, iter
    for i_exp in randperm(n_exp)
        loss, back = Flux.pullback(loss_neuralode, p, i_exp)
        update!(opt, p, back(one(loss))[1])
        p = clip_p(p)
        loss_all[i_exp] = loss
        #println(string("epoch=", iter, " i_exp=", i_exp, " loss=", loss))
    end
    loss_mean = sum(loss_all)/n_exp
    cb(p, loss_mean)
end

The loss function will look like Put the CRNN into an ODE integrator

BTW, the above code uses adjoint sensitivity from DiffEq. @ChrisRackauckas

jiweiqi commented 3 years ago

Minor query that arises while going through this paper, is that the weights of the input layer are required to be positive (as stoichiometric coefficients of reactants are positive)

I couldn't find out an appropriate way to do using flux. Let me know if anyone finds this interesting

Regarding the requirements of positive weights, initially, I used p = clip_p(p). Some people suggest using the hook function in Zygote, but I don't know how to do it. Those days, I prefer manually defining the layers, instead of using the Dense module. Then it is easy to enforce the weights to be non-negative. For instance, use abs(w_in) in the layer.

jiweiqi commented 3 years ago

Another note is that you can use a non-stiff solver, such as Tsit5() here since the system is not really stiff. such that a non-stiff solver might be faster.

jiweiqi commented 3 years ago

BTW, it is straightforward to hybrid Catalysis.jl and CRNN. i.e., combine the known pathways and unknown pathways into the function dydt.

yewalenikhil65 commented 3 years ago

Hello @jiweiqi , glad you are here. I know Julia but am new to Sci_ML framework ,and getting acquainted with it. The syntax is definitely much more friendlier than the Pytorch code. I have more queries related to the method itself in the paper and the code, which I will mail you in detail soon today.

Meanwhile, I am playing around with this code, and seem to run into following warning in first couple of iterations denoted by iter in the code.

dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable.

jiweiqi commented 3 years ago

Glad to hear more discussions from you.

For dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable., what ODE solver you are using? Tsit5() or rosenbrock?

yewalenikhil65 commented 3 years ago

Rosenbrock(autodiff=false) as in the code

jiweiqi commented 3 years ago

I remember this code runs successfully at that time. It could be due to the versions changes of Julia and DiffEq. Meanwhile, is the code still continuing with the error message? If it is still continuing, then you can ignore it.

ChrisRackauckas commented 3 years ago

It could be due to the initial network you get? Or if you end up at some random bad parameters you can hit that and have the optimizer reject the parameters.

yewalenikhil65 commented 3 years ago

The code is still running `iter = 53 loss = 3.6875534

┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ DiffEqBase ~/.julia/packages/DiffEqBase/zXRWE/src/integrator_interface.jl:343

iter = 54 loss = 3.6740432 iter = 55 loss = 3.6123388 iter = 56 loss = 3.5788953 iter = 57 loss = 3.5316973 iter = 58 loss = 3.6371818`

but the warning message occurs intermittently. The weights(coefficients ) seem to be not converging though `iter = 60 loss = 3.4807959

s1 Float32[0.04838549, 0.0, 0.052780855, 0.0818088] s2 Float32[0.028086396, 0.031885616, 0.34444085, 0.028887317] s3 Float32[0.7293116, 0.00031588576, 0.27249622, 0.041405488] s4 Float32[0.13578044, 0.26466164, 0.029718699, 0.029005153] s5 Float32[0.088181436, 0.34516084, 0.25139907, 0.74834317]`

i.e. say for first reaction, (first column of above matrix of rows) more that two species have their coefficients approaching 1.0 as can be seen in following iteration

`iter = 80 loss = 2.7691839

s1 Float32[0.048153307, 0.0, 0.072791025, 0.08247974] s2 Float32[0.03963388, 0.0644407, 0.2503924, 0.024419485] s3 Float32[0.7418091, 0.00031526515, 0.27819398, 0.042485863] s4 Float32[0.13934119, 0.26573816, 0.025210617, 0.028881501] s5 Float32[0.10496614, 0.34978426, 0.24204765, 0.74635804]`

jiweiqi commented 3 years ago

Oh. yes, couple potential reasons:

the intermediate NN during the training phase could have bad properties, for example, very stiff.
in order to do log operation, the concentrations are clipped, which could induce kind of discontinuations and impose challenges to the ODE solver. (I am not sure about this point.)

I was not able to get rid of those warnings. I am working on some tricks and they might improve it. For example, better initialization. Yingbo suggested small initial weights benefit the training of neural ODE, in general. Say the weights to be 1e-5 ... But in general, the warning doesn't effect the training results too much. As long as the loss function is dropping, and the weights are converging to a good one, it is great.

jiweiqi commented 3 years ago

Another trick that works around is to use larger tolerance for the ODE solver initially, and then gradually tighten the tolerance.

jiweiqi commented 3 years ago

Rosenbrock(autodiff=false) as in the code

I am also running the code in my PC with Julia 1.5.1. It seems that alg = Tsit5() is faster for this problem and no warning raised.

yewalenikhil65 commented 3 years ago

The loss is dropping although much slowly, (coefficients are not yet converging to the expected values) as compared to your Pytorch code. Quick questions about the code.

The Need for NeuralODE in this model is not clear. Pytorch code seemed to do pretty well, and quickly in predicting the weights and very much in-line with the description in the paper. I imagined a Julia code would look simply like following :) ( please consider this as pseudo-code)

X_train  is data from ODESolution (for 30 expts)  of size 5 x 30*length(t)
Y_train is RHS of the ODE system (for same 30 expts) of size 5 x 30*length(t)

train_loader = Flux.Data.DataLoader((X_train, Y_train), batchsize=5, shuffle=true)
using Flux
CRNN = Chain((x, p)->log.(clamp.(x, lb, ub)),Dense(num_species,num_reaction,exp), Dense(num_reaction,num_species))

p = Flux.params(CRNN)
# paramters W, b of neural-net

loss(x, y) = Flux.Losses.mse(CRNN(x), y)
# define loss function as MSE

evalcb = () -> @show(loss(X_train,Y_train)) #
# call-bacl function for monitoring the loss

Flux.train!(loss, p, ncycle(train_loader, 10000), ADAM(0.001), cb = Flux.throttle(evalcb, 1))
# training on extended data-set (or dataset with epochs)
# with ADAM optimiser, call back prints loss at every 1 seconds

The need for enforcing parameters to the reactants weights (input layer weights) is understood. Why are the weights in output layers is also initialized . Isn't that exactly what we are trying to inferr ?

p[25:29] .= [-2,  1,  0,  0, 0]
p[30:34] .= [-1,  0,  1,  0, 0]
p[35:39] .= [ 0,  0, -1,  1, 0]
p[40:44] .= [ 0, -1,  0, -1, 1]

With NeuralODE approach, shouldn't the following function clamp the solution pred between lb to ub, instead of -ub to ub ? This comes from the thought that concentration of species should always be positive . Or it is that you have clamped it this way as the log of concentrations are involved ?
```
function predict_neuralode(u0, p)
pred = clamp.(Array(prob_neuralode(u0, p)), -ub, ub)
return pred
end
```

yewalenikhil65 commented 3 years ago

Rosenbrock(autodiff=false) as in the code

I am also running the code in my PC with Julia 1.5.1. It seems that alg = Tsit5() is faster for this problem and no warning raised.

I am running with Julia 1.5.2, will try with alg = Tsit5() also with it

jiweiqi commented 3 years ago

Sure, you can do it the same way as in the paper. The benefit of neural ode is that it works when only part of the species is measured. But, again, if you have all of the species measured, the approach in the paper might be more efficient.

The output layer is not initialized there. That part of code was for debugging and it is actually not being used, right?

yewalenikhil65 commented 3 years ago

The loss function will look like Put the CRNN into an ODE integrator

I obtained following training results, (had to interrupt as the loss was increasing) using Rosenbrock23 . Running for Tsit5() also i_exp_1 and the loss function looked something like this loss

yewalenikhil65 commented 3 years ago

The output layer is not initialized there. That part of code was for debugging and it is actually not being used, right?

Sorry @jiweiqi ,my mistake.I got confused. I realise set_p(p) is not being called at all in the code.Got it confused with names clip_() andset_(). You are right, it should be used later for debugging step later.

jiweiqi commented 3 years ago

The loss function will look like Put the CRNN into an ODE integrator

I obtained following training results, (had to interrupt as the loss was increasing) using Rosenbrock23 . Running for Tsit5() also and the loss function looked something like this

Yeah. your loss looks good. Give it some time and run for 1E4 iterations (based on my plot).

yewalenikhil65 commented 3 years ago

@jiweiqi couple more questions, 1) any comment on reason for clamping (upper bound ub) of model using FastChain function as log(10) ? 2) predict_neuralode(u0, p) function also clamps from -ub to ub , Any particular reason to also assign negative lower bound ? 3) any general guideline on determining this upper bound for certain reaction pathways ?

jiweiqi commented 3 years ago

@jiweiqi couple more questions,

any comment on reason for clamping (upper bound ub) of model using FastChain function as log(10) ?

Sometimes, x could be very large if the reaction rate constants are very large, although, in reality, this usually not the case, unless there is a kind of explosion, as the explosion of COVID case numbers. For this particular case, it seems that having an upper bound helps to stabilize the training.

predict_neuralode(u0, p) function also clamps from -ub to ub , Any particular reason to also assign negative lower bound?

Similar to the Q1. employing bounds here will bound the loss function and the potential exploding gradients. In SGD/Adam, we would like the gradients not too large. So both lower bound and upper bounds are needed. Putting a lower bound of zero would make the loss function insensitive to the predicted negative concentrations. For example, one parameter lead to x = -10, and another one lead to x = -1000. We would like the loss function could reflect the differences between them.

any general guideline on determining this upper bound for certain reaction pathways?

Couple reference values: (1) The maximum concentrations in the systems, which can be obtained from the measured time-series data. (2) The bound will help bound the gradient values, this can be determined by diagnosing the parameters' gradient. For instance, choose an upper bound that helps to clip the gradients.

jiweiqi commented 3 years ago

Hi @yewalenikhil65 , are you able to get the expected weights now?

yewalenikhil65 commented 3 years ago

Hi @yewalenikhil65 , are you able to get the expected weights now?

Hello Sir @jiweiqi Thank you for letting me know about those lower and upper bound details. I ran the code twice, using Tsit5() alg.. First, the losses did converged for 1e04 iterations ,as per your plots (and also the concentrations by training matched the trueODEsolution ), weights did not converged exactly as shown in the paper, but only approximately (as atached below)

Second time, I re-ran the same code , but encountered completely different behavior of the loss function as shown below in the figure. The losses kept being constant even after iterations 350. (Its taking huge time to run as my laptop isn't of very good configuration and crashes down frequently, thankfully, I am getting access to a good PC tomorrow.) loss

Regarding, the physical system I was trying to model using NN: I will mail tomorrow , was gathering some details and references

SciML / Catalyst.jl

Discovery of chemical reactions-pathways using neural networks #264