Open yewalenikhil65 opened 3 years ago
Might as well ping @jiweiqi. You could do this quite straightforwardly in Flux. It would be good to implement it.
Hi @yewalenikhil65, sorry for my late to your e-mail. I am happy that you try to find solutions using Julia. I didn't know that you also use Julia. I do have a Julia implementation of the Case I of the arXiv paper, as shown below. Although I have made some improvements later, this will demonstrate the idea, and we can discuss the improvements.
# Reference: Ji, Weiqi, and Sili Deng. "Autonomous Discovery of Unknown Reaction Pathways from Data by Chemical Reaction Neural Network." arXiv preprint arXiv:2002.09062 (2020).
using DiffEqFlux, OrdinaryDiffEq, Flux, Optim, Plots
using Flux: gradient
using Flux.Optimise: update!
using Random
n_plot = 10
n_epoch = 100000
n_exp = 30
function trueODEfunc(dydt, y, k, t)
dydt[1] = -2 * k[1] * y[1]^2 - k[2] * y[1]
dydt[2] = k[1] * y[1]^2 - k[4] * y[2] * y[4]
dydt[3] = k[2] * y[1] - k[3] * y[3]
dydt[4] = k[3] * y[3] - k[4] * y[2] * y[4]
dydt[5] = k[4] * y[2] * y[4]
end
ns = 5
nr = 4
u0_list = rand(Float32, (n_exp, ns))
u0_list[:, 3:ns] .= 0
datasize = 20
tspan = Float32[0.0, 20.0]
tsteps = range(tspan[1], tspan[2], length = datasize)
k = Float32[0.1, 0.2, 0.13, 0.3]
# Tsit5() seems to be more effcient and robust here
alg = Tsit5() #Rosenbrock23(autodiff = false)
ode_data_list = []
for i in 1:n_exp
u0 = u0_list[i, :]
prob_trueode = ODEProblem(trueODEfunc, u0, tspan, k)
ode_data = Array(solve(prob_trueode, alg, saveat = tsteps))
push!(ode_data_list, ode_data)
end
lb = 1e-5
ub = 10.0
dudt2 = FastChain((x, p)->log.(clamp.(x, lb, ub)),
FastDense(5, 4, exp),
FastDense(4, 5))
prob_neuralode = NeuralODE(dudt2, tspan, alg, saveat = tsteps)
p = prob_neuralode.p
function clip_p(p)
# layer1: 5x4+4 = 24
# layer2: 4x5+5 = 25
p[1:20] .= clamp.(p[1:20], 0, 2.5)
p[45:49] .= 0
return p
end
# For debuging code, manually set the NN weights
# function set_p(p)
# p = zeros(49)
# p[1:4] = [2,1,0,0]
# p[5:8] = [0,0,0,1]
# p[9:12] = [0,0,1,0]
# p[13:16] = [0,0,0,1]
# p[17:20] = [0,0,0,0]
# p[21:24] .= log.([0.1, 0.2, 0.13, 0.3])
# p[25:29] .= [-2, 1, 0, 0, 0]
# p[30:34] .= [-1, 0, 1, 0, 0]
# p[35:39] .= [ 0, 0, -1, 1, 0]
# p[40:44] .= [ 0, -1, 0, -1, 1]
# return p
# end
function predict_neuralode(u0, p)
pred = clamp.(Array(prob_neuralode(u0, p)), -ub, ub)
return pred
end
function loss_neuralode(p, i_exp)
u0 = u0_list[i_exp, :]
ode_data = ode_data_list[i_exp]
pred = predict_neuralode(u0, p)
loss = sum(abs2, ode_data .- pred)
return loss
end
function loss_pred_neuralode(p, i_exp)
u0 = u0_list[i_exp, :]
ode_data = ode_data_list[i_exp]
pred = predict_neuralode(u0, p)
loss = sum(abs2, ode_data .- pred)
return loss, pred
end
function display_p(p)
println("s1")
println(p[1:4])
println("s2")
println(p[5:8])
println("s3")
println(p[9:12])
println("s4")
println(p[13:16])
println("s5")
println(p[17:20])
end
# Callback function to observe training
cbi = function (p, i_exp)
ode_data = ode_data_list[i_exp]
pred = predict_neuralode(u0_list[i_exp, :], p)
list_plt = []
for i in 1:ns
plt = scatter(tsteps, ode_data[i,:], title = string(i), label = string("data_",i))
plot!(plt, tsteps, pred[i,:], label = string("pred_",i))
push!(list_plt, plt)
end
plt_all = plot(list_plt..., legend = false)
png(plt_all, string("figs/i_exp_", i_exp))
return false
end
list_loss = []
iter = 0
cb = function (p, loss_mean)
global list_loss, iter
push!(list_loss, loss_mean)
println(string("\n", "iter = ", iter, " loss = ", loss_mean, "\n"))
if iter % n_plot == 0
display_p(p)
for i_exp in [1, 10, 20, 30]
cbi(p, i_exp)
end
if iter < 2000
plt_loss = plot(list_loss, yscale = :log10, label="loss")
else
plt_loss = plot(list_loss, xscale = :log10, yscale = :log10, label="loss")
end
png(plt_loss, "figs/loss")
end
iter += 1
end
opt = ADAM(0.001)
p = clip_p(p)
loss_all = zeros(Float32, n_exp)
for epoch in 1:n_epoch
global p, iter
for i_exp in randperm(n_exp)
loss, back = Flux.pullback(loss_neuralode, p, i_exp)
update!(opt, p, back(one(loss))[1])
p = clip_p(p)
loss_all[i_exp] = loss
#println(string("epoch=", iter, " i_exp=", i_exp, " loss=", loss))
end
loss_mean = sum(loss_all)/n_exp
cb(p, loss_mean)
end
The loss function will look like Put the CRNN into an ODE integrator
BTW, the above code uses adjoint sensitivity from DiffEq. @ChrisRackauckas
Minor query that arises while going through this paper, is that the weights of the input layer are required to be positive (as stoichiometric coefficients of reactants are positive)
I couldn't find out an appropriate way to do using flux. Let me know if anyone finds this interesting
Regarding the requirements of positive weights, initially, I used p = clip_p(p)
. Some people suggest using the hook function in Zygote, but I don't know how to do it. Those days, I prefer manually defining the layers, instead of using the Dense module. Then it is easy to enforce the weights to be non-negative. For instance, use abs(w_in)
in the layer.
Another note is that you can use a non-stiff solver, such as Tsit5() here since the system is not really stiff. such that a non-stiff solver might be faster.
BTW, it is straightforward to hybrid Catalysis.jl and CRNN. i.e., combine the known pathways and unknown pathways into the function dydt
.
Hello @jiweiqi , glad you are here. I know Julia but am new to Sci_ML framework ,and getting acquainted with it. The syntax is definitely much more friendlier than the Pytorch code. I have more queries related to the method itself in the paper and the code, which I will mail you in detail soon today.
Meanwhile, I am playing around with this code, and seem to run into following warning in first couple of iterations denoted by iter
in the code.
dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable.
Glad to hear more discussions from you.
For dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable.
,
what ODE solver you are using? Tsit5() or rosenbrock?
Rosenbrock(autodiff=false)
as in the code
I remember this code runs successfully at that time. It could be due to the versions changes of Julia and DiffEq. Meanwhile, is the code still continuing with the error message? If it is still continuing, then you can ignore it.
It could be due to the initial network you get? Or if you end up at some random bad parameters you can hit that and have the optimizer reject the parameters.
The code is still running `iter = 53 loss = 3.6875534
┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ DiffEqBase ~/.julia/packages/DiffEqBase/zXRWE/src/integrator_interface.jl:343
iter = 54 loss = 3.6740432 iter = 55 loss = 3.6123388 iter = 56 loss = 3.5788953 iter = 57 loss = 3.5316973 iter = 58 loss = 3.6371818`
but the warning message occurs intermittently. The weights(coefficients ) seem to be not converging though `iter = 60 loss = 3.4807959
s1 Float32[0.04838549, 0.0, 0.052780855, 0.0818088] s2 Float32[0.028086396, 0.031885616, 0.34444085, 0.028887317] s3 Float32[0.7293116, 0.00031588576, 0.27249622, 0.041405488] s4 Float32[0.13578044, 0.26466164, 0.029718699, 0.029005153] s5 Float32[0.088181436, 0.34516084, 0.25139907, 0.74834317]`
i.e. say for first reaction, (first column of above matrix of rows) more that two species have their coefficients approaching 1.0 as can be seen in following iteration
`iter = 80 loss = 2.7691839
s1 Float32[0.048153307, 0.0, 0.072791025, 0.08247974] s2 Float32[0.03963388, 0.0644407, 0.2503924, 0.024419485] s3 Float32[0.7418091, 0.00031526515, 0.27819398, 0.042485863] s4 Float32[0.13934119, 0.26573816, 0.025210617, 0.028881501] s5 Float32[0.10496614, 0.34978426, 0.24204765, 0.74635804]`
Oh. yes, couple potential reasons:
log
operation, the concentrations are clipped, which could induce kind of discontinuations and impose challenges to the ODE solver. (I am not sure about this point.)I was not able to get rid of those warnings. I am working on some tricks and they might improve it. For example, better initialization. Yingbo suggested small initial weights benefit the training of neural ODE, in general. Say the weights to be 1e-5 ... But in general, the warning doesn't effect the training results too much. As long as the loss function is dropping, and the weights are converging to a good one, it is great.
Another trick that works around is to use larger tolerance for the ODE solver initially, and then gradually tighten the tolerance.
Rosenbrock(autodiff=false)
as in the code
I am also running the code in my PC with Julia 1.5.1. It seems that alg = Tsit5()
is faster for this problem and no warning raised.
The loss is dropping although much slowly, (coefficients are not yet converging to the expected values) as compared to your Pytorch code. Quick questions about the code.
NeuralODE
in this model is not clear. Pytorch code seemed to do pretty well, and quickly in predicting the weights and very much in-line with the description in the paper. I imagined a Julia code would look simply like following :) ( please consider this as pseudo-code)X_train is data from ODESolution (for 30 expts) of size 5 x 30*length(t)
Y_train is RHS of the ODE system (for same 30 expts) of size 5 x 30*length(t)
train_loader = Flux.Data.DataLoader((X_train, Y_train), batchsize=5, shuffle=true)
using Flux
CRNN = Chain((x, p)->log.(clamp.(x, lb, ub)),Dense(num_species,num_reaction,exp), Dense(num_reaction,num_species))
p = Flux.params(CRNN)
# paramters W, b of neural-net
loss(x, y) = Flux.Losses.mse(CRNN(x), y)
# define loss function as MSE
evalcb = () -> @show(loss(X_train,Y_train)) #
# call-bacl function for monitoring the loss
Flux.train!(loss, p, ncycle(train_loader, 10000), ADAM(0.001), cb = Flux.throttle(evalcb, 1))
# training on extended data-set (or dataset with epochs)
# with ADAM optimiser, call back prints loss at every 1 seconds
p[25:29] .= [-2, 1, 0, 0, 0] p[30:34] .= [-1, 0, 1, 0, 0] p[35:39] .= [ 0, 0, -1, 1, 0] p[40:44] .= [ 0, -1, 0, -1, 1]
pred
between lb
to ub
, instead of -ub
to ub
? This comes from the thought that concentration of species should always be positive . Or it is that you have clamped it this way as the log
of concentrations are involved ?
function predict_neuralode(u0, p)
pred = clamp.(Array(prob_neuralode(u0, p)), -ub, ub)
return pred
end
Rosenbrock(autodiff=false)
as in the codeI am also running the code in my PC with Julia 1.5.1. It seems that
alg = Tsit5()
is faster for this problem and no warning raised.
I am running with Julia 1.5.2, will try with alg = Tsit5() also with it
Sure, you can do it the same way as in the paper. The benefit of neural ode is that it works when only part of the species is measured. But, again, if you have all of the species measured, the approach in the paper might be more efficient.
The output layer is not initialized there. That part of code was for debugging and it is actually not being used, right?
The loss function will look like Put the CRNN into an ODE integrator
I obtained following training results, (had to interrupt as the loss was increasing) using Rosenbrock23
. Running for Tsit5() also
and the loss function looked something like this
The output layer is not initialized there. That part of code was for debugging and it is actually not being used, right?
Sorry @jiweiqi ,my mistake.I got confused. I realise set_p(p)
is not being called at all in the code.Got it confused with names clip_() and
set_(). You are right, it should be used later for debugging step later.
The loss function will look like Put the CRNN into an ODE integrator
I obtained following training results, (had to interrupt as the loss was increasing) using
Rosenbrock23
. Running for Tsit5() also and the loss function looked something like this
Yeah. your loss looks good. Give it some time and run for 1E4 iterations (based on my plot).
@jiweiqi couple more questions,
1) any comment on reason for clamping (upper bound ub)
of model using FastChain function as log(10) ?
2) predict_neuralode(u0, p) function also clamps from -ub to ub
, Any particular reason to also assign negative lower bound ?
3) any general guideline on determining this upper bound for certain reaction pathways ?
@jiweiqi couple more questions,
- any comment on reason for clamping
(upper bound ub)
of model using FastChain function as log(10) ?
Sometimes, x could be very large if the reaction rate constants are very large, although, in reality, this usually not the case, unless there is a kind of explosion, as the explosion of COVID case numbers. For this particular case, it seems that having an upper bound helps to stabilize the training.
- predict_neuralode(u0, p) function also clamps from
-ub to ub
, Any particular reason to also assign negative lower bound?
Similar to the Q1. employing bounds here will bound the loss function and the potential exploding gradients. In SGD/Adam, we would like the gradients not too large. So both lower bound and upper bounds are needed. Putting a lower bound of zero would make the loss function insensitive to the predicted negative concentrations. For example, one parameter lead to x = -10, and another one lead to x = -1000. We would like the loss function could reflect the differences between them.
- any general guideline on determining this upper bound for certain reaction pathways?
Couple reference values: (1) The maximum concentrations in the systems, which can be obtained from the measured time-series data. (2) The bound will help bound the gradient values, this can be determined by diagnosing the parameters' gradient. For instance, choose an upper bound that helps to clip the gradients.
Hi @yewalenikhil65 , are you able to get the expected weights now?
Hi @yewalenikhil65 , are you able to get the expected weights now?
Hello Sir @jiweiqi Thank you for letting me know about those lower and upper bound details. I ran the code twice, using Tsit5() alg.. First, the losses did converged for 1e04 iterations ,as per your plots (and also the concentrations by training matched the trueODEsolution ), weights did not converged exactly as shown in the paper, but only approximately (as atached below)
Second time, I re-ran the same code , but encountered completely different behavior of the loss function as shown below in the figure. The losses kept being constant even after iterations 350. (Its taking huge time to run as my laptop isn't of very good configuration and crashes down frequently, thankfully, I am getting access to a good PC tomorrow.)
Regarding, the physical system I was trying to model using NN: I will mail tomorrow , was gathering some details and references
Not sure if this is the right place to post, but posting here since Catalyst seems only place related to Chemical reactions. I recently across paper deriving unknown chemical reaction pathways using neural-net
https://arxiv.org/abs/2002.09062
Paper discusses physically interpretable chemical reaction neural network (CRNN) based on law of mass action as
aA + bB --> cC +dD
whose rate of reaction can be expressed using law of mass action asR = exp(ln(k) + a ln[A] + b ln [B] + c ln [C] + d[D])
So one can imagine activation function as exponential function , weights of input layers as stoichiometric coefficients of reactants , inputs as ln ([species]) and bias as ln[rate constant]
Although not a generalized neural net frame for ODEs , this could really be good addition in Sci_ML/Catalyst related to Chemical reactions.
I am currently trying to implement this using Julia
Minor query that arises while going through this paper, is that the weights of input layer are required to be positive (as stoichiometric coefficients of reactants are positive)
I couldn't find out an appropriate way to do using flux. Let me know if anyone finds this interesting