Closed ChrisRackauckas closed 1 year ago
starter code:
using OrdinaryDiffEq, NeuralNetDiffEq, Plots
function f(du,u,p,t)
du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
[p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end
p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
true_sol = solve(prob,Tsit5())
chain = Flux.Chain(Dense(1,32, σ), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,256, σ), Dense(256,1028, tanh), Dense(1028,1028, tanh), Dense(1028,length(u0)))
opt = Flux.Descent(0.00000001)
sol = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt),maxiters = 100, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)
From GCI:
using OrdinaryDiffEq, NeuralNetDiffEq, Plots, Flux
function f(du,u,p,t)
du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
[p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end
p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
true_sol = solve(prob,Tsit5())
opt = ADAM(1e-03) #1e-04
# opt = NADAM()
# opt = Nesterov()
# opt = AMSGrad()
# chain = Chain(x -> reshape(x, length(x), 1, 1), Conv((1,), 1=>16, relu), Conv((1,), 16=>8, relu), x -> reshape(x, :, size(x, 4)), Dense(8, 10), softmax)
chain = Chain(
x -> reshape(x, length(x), 1, 1),
MaxPool((1,)),
Conv((1,), 1=>16, relu),
Conv((1,), 16=>16, relu),
Conv((1,), 16=>32, relu),
Conv((1,), 32=>64, relu),
Conv((1,), 64=>256, relu),
Conv((1,), 256=>256, relu),
Conv((1,), 256=>1028, relu),
Conv((1,), 1028=>1028),
x -> reshape(x, :, size(x, 4)),
Dense(1028, 512, tanh),
Dense(512, 128, relu),
Dense(128, 64, tanh),
Dense(64, 2),
softmax)
sol = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt),maxiters = 100, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)
It's still monotonic. For some reason things keep coming out monotonic!
using OrdinaryDiffEq, DiffEqFlux, NeuralNetDiffEq, Plots, Optim, Flux
function f(du,u,p,t)
du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
[p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end
p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
true_sol = solve(prob,Tsit5())
N = 128
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
opt = BFGS()
sol = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=false,diffmode=DiffEqFlux.ReverseDiffMode()),
maxiters = 1000, verbose = true,
dt=1/5f0)
plot(true_sol)
plot!(sol)
sol = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=true),maxiters = 1000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)
#=
sol = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=true,diffmode=DiffEqFlux.ForwardDiffMode()),
maxiters = 1000, verbose = true,
dt=1/5f0)
plot(true_sol)
plot!(sol)
=#
sol = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=false,diffmode=DiffEqFlux.TrackerDiffMode()),
maxiters = 1000, verbose = true,
dt=1/5f0)
plot(true_sol)
plot!(sol)
shows that using ReverseDiff fixes how it can converge. Looks like Zygote is dropping gradients, specifically:
https://github.com/FluxML/Zygote.jl/issues/231 https://github.com/FluxML/Zygote.jl/issues/314
@dhairyagandhi96
Looks like Zygote is dropping gradients, specifically:
this should be fixed on zygote master
has it? The issues weren't closed though?
Yeah, we'd closed both those issues with the closures, seems different from that
https://github.com/FluxML/Zygote.jl/pull/510 needs to get rebased then. Whoever can force push could you please?
I was getting tracked array type mismatch error with the mwe here, is there some setup to see a clearer error? This is with masters of all the relevant packages
I'm tagging everything, it's just Zygote that needs a special branch. There shouldn't be tracked arrays except with ReverseDiff?
https://github.com/JuliaDiffEq/NeuralNetDiffEq.jl/pull/70 shows that naive training of a big network still doesn't work that well, though it's getting closer:
using OrdinaryDiffEq, DiffEqFlux, NeuralNetDiffEq, Plots, Optim, Flux
function f(du,u,p,t)
du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
[p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end
p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
true_sol = solve(prob,Tsit5())
N = 128
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
opt = ADAM(0.000001)
θ = DiffEqFlux.initial_params(chain)
sol = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,θ,autodiff=true),maxiters = 10000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)
using CuArrays
function f(u,p,t)
cu([p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]])
end
p = cu(Float32[1.5,1.0,3.0,1.0])
u0 = cu(Float32[1.0,1.0])
N = 512
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
θ = cu(DiffEqFlux.initial_params(chain))
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
sol = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,θ,autodiff=false),maxiters = 10000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)
I think that what is really required is an investigation of new training strategies. I'll open an issue about this
Still can't do this one. Lotka-Volterra is the hardest one 😅
using NeuralPDE, OrdinaryDiffEq, DiffEqFlux, Flux, OptimizationPolyalgorithms
function f(u, p, t)
[p[1] * u[1] - p[2] * u[1] * u[2], -p[3] * u[2] + p[4] * u[1] * u[2]]
end
p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob_oop = ODEProblem{false}(f, u0, (0.0, 3.0), p)
true_sol = solve(prob_oop, Tsit5())
N = 1028
chain = FastChain(FastDense(1, N, softplus), FastDense(N, N, softplus), FastDense(N, N, softplus),
FastDense(N, N, softplus), FastDense(N, N, softplus), FastDense(N, N, softplus),
FastDense(N, N, softplus), FastDense(N, length(u0), softplus))
opt = ADAM(0.001)
θ = Float64.(DiffEqFlux.initial_params(chain))
alg = NeuralPDE.NNODE(chain, opt, θ; strategy = StochasticTraining(2000))
sol = solve(prob_oop, alg, verbose=true, maxiters = 200)
using Plots
plot(sol)
plot!(true_sol)
This is because the information is propagated from the initial values along time according to the ODE system. Introducing points with a large time span will not help convergence. This is a simple problem that does not require a large network. NNODE should have "time stepping" like other ode solvers. This problem can be easily solved if we train on [0,2] first and then train on [0,3]. Or we can save the prediction at t=2 and retrain the nn on [2,3].
I thought I tried that. Worth giving it another try. Another thing could be to just add a weight to the loss that biases it more heavily towards the front. That would then more naturally work itself out over the course of an optimization.
I would not use weighting to overcome this bias, as it would not capture the error in a narrow area no matter how it's weighted. Sampling more points is a natural way of weighting. Here is what I got:
using OrdinaryDiffEq
function f(u, p, t)
return [p[1] * u[1] - p[2] * u[1] * u[2], -p[3] * u[2] + p[4] * u[1] * u[2]]
end
p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob_oop = ODEProblem{false}(f, u0, (0.0, 3.0), p)
true_sol = solve(prob_oop, Tsit5(), saveat=0.01)
using ModelingToolkit
using Sophon, IntervalSets
using Optimization, OptimizationOptimJL, OptimizationOptimisers
@parameters t
@variables x(..), y(..)
Dₜ = Differential(t)
eqs = [Dₜ(x(t)) ~ p[1] * x(t) - p[2] * x(t) * y(t),
Dₜ(y(t)) ~ -p[3] * y(t) + p[4] * x(t) * y(t)]
domain = [t ∈ 0 .. 3.0]
bcs = [x(0.0) ~ 1.0, y(0.0) ~ 1.0]
@named lotka_volterra = PDESystem(eqs, bcs, domain, [t], [x(t), y(t)])
chain = FullyConnected(1, 1, sin; hidden_dims=6, num_layers=2)
pinn = PINN(x = chain, y = chain)
sampler = QuasiRandomSampler(1, 1) # ingore this line
strategy = NonAdaptiveTraining() # ingore this line
prob = Sophon.discretize(lotka_volterra, pinn, sampler, strategy)
# more points towards the front
data_1 = rand(1, 100)
data_2 = rand(1, 50) .+ 1.0
data_3 = rand(1, 10) .+ 2.0
data = [data_1 data_2 data_3]
prob.p[1] = data # manually change the dataset
prob.p[2] = data
function callback(p, l)
println("Current loss is: $l")
return false
end
res = solve(prob, BFGS(); callback=callback, maxiters=2000)
using Plots
phi = pinn.phi
ts = [true_sol.t...;;]
x_pred = phi.x(ts, res.u.x)
y_pred = phi.y(ts, res.u.y)
plot(vec(ts), vec(x_pred), label="x_pred")
plot!(vec(ts), vec(y_pred), label="y_pred")
plot!(true_sol)
Additional Notes: the BFGS optimizer is really confused with a changing loss function. It‘s necessary to re-instantiate the optimizer if the weights are updated.
We should get some examples showing bigger neural networks training things like Lotka-Volterra. Might need GPUs.