Push nnode further - Githubissues

ChrisRackauckas commented 4 years ago

We should get some examples showing bigger neural networks training things like Lotka-Volterra. Might need GPUs.

ChrisRackauckas commented 4 years ago

starter code:

using OrdinaryDiffEq, NeuralNetDiffEq, Plots

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

chain = Flux.Chain(Dense(1,32, σ), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,256, σ), Dense(256,1028, tanh), Dense(1028,1028, tanh), Dense(1028,length(u0)))
opt = Flux.Descent(0.00000001)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt),maxiters = 100, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

ChrisRackauckas commented 4 years ago

From GCI:

using OrdinaryDiffEq, NeuralNetDiffEq, Plots, Flux

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

opt = ADAM(1e-03) #1e-04
# opt = NADAM()
# opt = Nesterov()
# opt = AMSGrad()
# chain = Chain(x -> reshape(x, length(x), 1, 1), Conv((1,), 1=>16, relu), Conv((1,), 16=>8, relu), x -> reshape(x, :, size(x, 4)), Dense(8, 10), softmax)

chain = Chain(
    x -> reshape(x, length(x), 1, 1),
    MaxPool((1,)),
    Conv((1,), 1=>16, relu),
    Conv((1,), 16=>16, relu),
    Conv((1,), 16=>32, relu),
    Conv((1,), 32=>64, relu),
    Conv((1,), 64=>256, relu),
    Conv((1,), 256=>256, relu),
    Conv((1,), 256=>1028, relu),
    Conv((1,), 1028=>1028),
    x -> reshape(x, :, size(x, 4)),
    Dense(1028, 512, tanh),
    Dense(512, 128, relu),
    Dense(128, 64, tanh),
    Dense(64, 2),
    softmax)

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt),maxiters = 100, verbose = true, dt=1/5f0)

plot(true_sol)
plot!(sol)

It's still monotonic. For some reason things keep coming out monotonic!

ChrisRackauckas commented 4 years ago

Capture

ChrisRackauckas commented 4 years ago

using OrdinaryDiffEq, DiffEqFlux, NeuralNetDiffEq, Plots, Optim, Flux

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

N = 128
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
opt = BFGS()
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=false,diffmode=DiffEqFlux.ReverseDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=true),maxiters = 1000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

#=
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=true,diffmode=DiffEqFlux.ForwardDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)
=#

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=false,diffmode=DiffEqFlux.TrackerDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)

shows that using ReverseDiff fixes how it can converge. Looks like Zygote is dropping gradients, specifically:

https://github.com/FluxML/Zygote.jl/issues/231 https://github.com/FluxML/Zygote.jl/issues/314

@dhairyagandhi96

CarloLucibello commented 4 years ago

Looks like Zygote is dropping gradients, specifically:

this should be fixed on zygote master

ChrisRackauckas commented 4 years ago

has it? The issues weren't closed though?

DhairyaLGandhi commented 4 years ago

Yeah, we'd closed both those issues with the closures, seems different from that

ChrisRackauckas commented 4 years ago

https://github.com/FluxML/Zygote.jl/pull/510 needs to get rebased then. Whoever can force push could you please?

DhairyaLGandhi commented 4 years ago

I was getting tracked array type mismatch error with the mwe here, is there some setup to see a clearer error? This is with masters of all the relevant packages

ChrisRackauckas commented 4 years ago

I'm tagging everything, it's just Zygote that needs a special branch. There shouldn't be tracked arrays except with ReverseDiff?

ChrisRackauckas commented 4 years ago

https://github.com/JuliaDiffEq/NeuralNetDiffEq.jl/pull/70 shows that naive training of a big network still doesn't work that well, though it's getting closer:

using OrdinaryDiffEq, DiffEqFlux, NeuralNetDiffEq, Plots, Optim, Flux

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
true_sol = solve(prob,Tsit5())

N = 128
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
opt = ADAM(0.000001)
θ = DiffEqFlux.initial_params(chain)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,θ,autodiff=true),maxiters = 10000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

using CuArrays
function f(u,p,t)
  cu([p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]])
end

p = cu(Float32[1.5,1.0,3.0,1.0])
u0 = cu(Float32[1.0,1.0])
N = 512
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
θ = cu(DiffEqFlux.initial_params(chain))
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,θ,autodiff=false),maxiters = 10000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

plot

ChrisRackauckas commented 4 years ago

I think that what is really required is an investigation of new training strategies. I'll open an issue about this

ChrisRackauckas commented 2 years ago

Still can't do this one. Lotka-Volterra is the hardest one 😅

using NeuralPDE, OrdinaryDiffEq, DiffEqFlux, Flux, OptimizationPolyalgorithms

function f(u, p, t)
    [p[1] * u[1] - p[2] * u[1] * u[2], -p[3] * u[2] + p[4] * u[1] * u[2]]
end

p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob_oop = ODEProblem{false}(f, u0, (0.0, 3.0), p)
true_sol = solve(prob_oop, Tsit5())

N = 1028
chain = FastChain(FastDense(1, N, softplus), FastDense(N, N, softplus), FastDense(N, N, softplus),
                  FastDense(N, N, softplus), FastDense(N, N, softplus), FastDense(N, N, softplus),
                  FastDense(N, N, softplus), FastDense(N, length(u0), softplus))
opt = ADAM(0.001)
θ = Float64.(DiffEqFlux.initial_params(chain))
alg = NeuralPDE.NNODE(chain, opt, θ; strategy = StochasticTraining(2000))
sol = solve(prob_oop, alg, verbose=true, maxiters = 200)

using Plots
plot(sol)
plot!(true_sol)

YichengDWu commented 2 years ago

This is because the information is propagated from the initial values along time according to the ODE system. Introducing points with a large time span will not help convergence. This is a simple problem that does not require a large network. NNODE should have "time stepping" like other ode solvers. This problem can be easily solved if we train on [0,2] first and then train on [0,3]. Or we can save the prediction at t=2 and retrain the nn on [2,3].

ChrisRackauckas commented 2 years ago

I thought I tried that. Worth giving it another try. Another thing could be to just add a weight to the loss that biases it more heavily towards the front. That would then more naturally work itself out over the course of an optimization.

YichengDWu commented 2 years ago

I would not use weighting to overcome this bias, as it would not capture the error in a narrow area no matter how it's weighted. Sampling more points is a natural way of weighting. Here is what I got:

using OrdinaryDiffEq

function f(u, p, t)
    return [p[1] * u[1] - p[2] * u[1] * u[2], -p[3] * u[2] + p[4] * u[1] * u[2]]
end

p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob_oop = ODEProblem{false}(f, u0, (0.0, 3.0), p)
true_sol = solve(prob_oop, Tsit5(), saveat=0.01)

using ModelingToolkit
using Sophon, IntervalSets
using Optimization, OptimizationOptimJL, OptimizationOptimisers

@parameters t
@variables x(..), y(..)

Dₜ = Differential(t)

eqs = [Dₜ(x(t)) ~ p[1] * x(t) - p[2] * x(t) * y(t),
      Dₜ(y(t)) ~ -p[3] * y(t) + p[4] * x(t) * y(t)]

domain = [t ∈ 0 .. 3.0]

bcs = [x(0.0) ~ 1.0, y(0.0) ~ 1.0]

@named lotka_volterra = PDESystem(eqs, bcs, domain, [t], [x(t), y(t)])

chain = FullyConnected(1, 1, sin; hidden_dims=6, num_layers=2)
pinn = PINN(x = chain, y = chain)
sampler = QuasiRandomSampler(1, 1) # ingore this line
strategy = NonAdaptiveTraining() # ingore this line

prob = Sophon.discretize(lotka_volterra, pinn, sampler, strategy)

# more points towards the front
data_1 = rand(1, 100)
data_2 = rand(1, 50) .+ 1.0
data_3 = rand(1, 10) .+ 2.0

data = [data_1 data_2 data_3]

prob.p[1] = data # manually change the dataset
prob.p[2] = data

function callback(p, l)
    println("Current loss is: $l")
    return false
end

res = solve(prob, BFGS(); callback=callback, maxiters=2000)

using Plots

phi = pinn.phi
ts = [true_sol.t...;;]
x_pred = phi.x(ts, res.u.x)
y_pred = phi.y(ts, res.u.y)

plot(vec(ts), vec(x_pred), label="x_pred")
plot!(vec(ts), vec(y_pred), label="y_pred")
plot!(true_sol)

plot_206

YichengDWu commented 2 years ago

Additional Notes: the BFGS optimizer is really confused with a changing loss function. It‘s necessary to re-instantiate the optimizer if the weights are updated.

SciML / NeuralPDE.jl

Push nnode further #46