Flesh out documentation

ChrisRackauckas commented 4 years ago

There's a lot of parts of the documentation that can be better described:

[ ] MNIST example can get a lot more text on what it's doing. Especially how it's performing minibatching.
[ ] Every tutorial should have pictures. Before and after training pictures kind of thing.
[ ] Benchmarks can get a lot more in there.
[ ] We should have written down LaTeX forms of the equations we're fitting and have pictures of neural networks mixed in for each of the examples.
[ ] It would be nice to get some animations in there.
[ ] How mass matrix ODEs impose conditions could get a lot more detail.
[ ] An example from a Hamiltonian or a graph network, i.e. https://github.com/SciML/DiffEqFlux.jl/issues/80
[ ] It would be nice to get a PDE example in there, maybe just copying the one from the UDE stuff.
[ ] Make sure every tutorial is covered by tests of some sort.
[ ] Clean up minibatching docs: https://github.com/SciML/DiffEqFlux.jl/pull/267

JeremyFongSP commented 4 years ago

For the Benchmarks, do you mean add more benchmarks or more text describing how it works/what it is doing? For the mass matrix ODE, is this LL-NN-Stiff.md?

ChrisRackauckas commented 4 years ago

For the Benchmarks, do you mean add more benchmarks or more text describing how it works/what it is doing?

Yeah, some more benchmarks between adjoints, maybe getting a copy of some of the tutorials running in torchdiffeq to see the timing difference.

ChrisRackauckas commented 4 years ago

For the mass matrix ODE, is this LL-NN-Stiff.md?

yes

ChrisRackauckas commented 4 years ago

Would be good to add a PDE-constrained optimized example:

using DelimitedFiles,Plots
using DiffEqSensitivity, OrdinaryDiffEq, Zygote, Flux, DiffEqFlux, Optim

# Problem setup parameters:
Lx = 10.0
x  = 0.0:0.01:Lx
dx = x[2] - x[1]
Nx = size(x)

u0 = exp.(-(x.-3.0).^2) # I.C

## Problem Parameters
p        = [1.0,1.0]    # True solution parameters
xtrs     = [dx,Nx]      # Extra parameters
dt       = 0.40*dx^2    # CFL condition
t0, tMax = 0.0 ,1000*dt
tspan    = (t0,tMax)
t        = t0:dt:tMax;

## Definition of Auxiliary functions
function ddx(u,dx)
    """
    2nd order Central difference for 1st degree derivative
    """
    return [[zero(eltype(u))] ; (u[3:end] - u[1:end-2]) ./ (2.0*dx) ; [zero(eltype(u))]]
end

function d2dx(u,dx)
    """
    2nd order Central difference for 1st degree derivative
    """
    return [[zero(eltype(u))]; (u[3:end] - 2.0.*u[2:end-1] + u[1:end-2]) ./ (dx^2); [zero(eltype(u))]]
end

## ODE descrition of the Physics:
function burgers(u,p,t)
    # Model paramters
    a0, a1 = p
    dx,Nx = xtrs #[1.0,3.0,0.125,100]
    #du .= a0 .* ddx(u.^2, dx) + a1 .* d2dx(u, dx) #.*ddx(ddx(u, dx) ,dx)
    return 2.0*a0 .* u +  a1 .* d2dx(u, dx) #.*ddx(ddx(u, dx) ,dx)
end

# Testing Solver on linear PDE
prob = ODEProblem(burgers,u0,tspan,p)
sol = concrete_solve(prob,Tsit5(), dt=dt,saveat=t);
#sol = solve(prob, AutoTsit5(Rosenbrock23()),dt=dt,tstop=tMax);
#sol  = solve(prob, AutoVern9(Rodas4()), dt=dt,tstop=tMax);
#sol = concrete_solve(prob,Tsit5(),u0,p,saveat=t0:dt:tMax,abstol=1e-8, dt=dt,
#                 reltol=1e-6,sensealg=InterpolatingAdjoint(checkpointing=true));

plot(x, sol.u[1], lw=3, label="t0", size=(800,500))
plot!(x, sol.u[end],lw=3, ls=:dash, label="tMax")

ps  = [0.1, 0.2];   # Initial guess for model parameters
# Defining Neural Net for HRT
function predict(θ)  # Our 1-layer neural network
    Array(concrete_solve(prob,Tsit5(),u0,θ,dt=dt,saveat=t))
    #Array(concrete_solve(prob,Tsit5(),u0,p,saveat=t0:dt:tMax,abstol=1e-8, dt=dt,
    #             reltol=1e-6,sensealg=InterpolatingAdjoint(checkpointing=true)))
end

## Defining Loss function
function loss(θ)
    pred = predict(θ)
    l = predict(θ)  - sol
    return sum(abs2, l), pred # Mean squared error
end

l,pred   = loss(ps)
size(pred), size(sol), size(t) # Checking sizes

Data   = Iterators.repeated((), nIters) # Creating an iterator
LOSS  = []                              # Loss accumulator
PRED  = []                              # prediction accumulator
PARS  = []                              # parameters accumulator

cb = function (θ,l,pred) #callback function to observe training
  display(l)
  append!(PRED, [pred])
  append!(LOSS, l)
  append!(PARS, [θ])
  false
end

cb(ps,loss(ps)...) # Testing callback function

# Let see prediction vs. Truth
scatter(sol[:,end], label="Truth", size=(800,500))
plot!(PRED[end][:,end], lw=2, label="Prediction")

res = DiffEqFlux.sciml_train(loss, ps, ADAM(0.01), cb = cb, maxiters = 100)  # Let check gradient propagation
ps = res.minimizer
res = DiffEqFlux.sciml_train(loss, ps, BFGS(), cb = cb, maxiters = 100)  # Let check gradient propagation
@show res.minimizer # returns [0.999999999999975, 1.0000000000000213]

JeremyFongSP commented 4 years ago

Would be good to add a PDE-constrained optimized example:

using DelimitedFiles,Plots
using DiffEqSensitivity, OrdinaryDiffEq, Zygote, Flux, DiffEqFlux, Optim

# Problem setup parameters:
Lx = 10.0
x  = 0.0:0.01:Lx
dx = x[2] - x[1]
Nx = size(x)

u0 = exp.(-(x.-3.0).^2) # I.C

## Problem Parameters
p        = [1.0,1.0]    # True solution parameters
xtrs     = [dx,Nx]      # Extra parameters
dt       = 0.40*dx^2    # CFL condition
t0, tMax = 0.0 ,1000*dt
tspan    = (t0,tMax)
t        = t0:dt:tMax;

## Definition of Auxiliary functions
function ddx(u,dx)
    """
    2nd order Central difference for 1st degree derivative
    """
    return [[zero(eltype(u))] ; (u[3:end] - u[1:end-2]) ./ (2.0*dx) ; [zero(eltype(u))]]
end

function d2dx(u,dx)
    """
    2nd order Central difference for 1st degree derivative
    """
    return [[zero(eltype(u))]; (u[3:end] - 2.0.*u[2:end-1] + u[1:end-2]) ./ (dx^2); [zero(eltype(u))]]
end

## ODE descrition of the Physics:
function burgers(u,p,t)
    # Model paramters
    a0, a1 = p
    dx,Nx = xtrs #[1.0,3.0,0.125,100]
    #du .= a0 .* ddx(u.^2, dx) + a1 .* d2dx(u, dx) #.*ddx(ddx(u, dx) ,dx)
    return 2.0*a0 .* u +  a1 .* d2dx(u, dx) #.*ddx(ddx(u, dx) ,dx)
end

# Testing Solver on linear PDE
prob = ODEProblem(burgers,u0,tspan,p)
sol = concrete_solve(prob,Tsit5(), dt=dt,saveat=t);
#sol = solve(prob, AutoTsit5(Rosenbrock23()),dt=dt,tstop=tMax);
#sol  = solve(prob, AutoVern9(Rodas4()), dt=dt,tstop=tMax);
#sol = concrete_solve(prob,Tsit5(),u0,p,saveat=t0:dt:tMax,abstol=1e-8, dt=dt,
#                 reltol=1e-6,sensealg=InterpolatingAdjoint(checkpointing=true));

plot(x, sol.u[1], lw=3, label="t0", size=(800,500))
plot!(x, sol.u[end],lw=3, ls=:dash, label="tMax")

ps  = [0.1, 0.2];   # Initial guess for model parameters
# Defining Neural Net for HRT
function predict(θ)  # Our 1-layer neural network
    Array(concrete_solve(prob,Tsit5(),u0,θ,dt=dt,saveat=t))
    #Array(concrete_solve(prob,Tsit5(),u0,p,saveat=t0:dt:tMax,abstol=1e-8, dt=dt,
    #             reltol=1e-6,sensealg=InterpolatingAdjoint(checkpointing=true)))
end

## Defining Loss function
function loss(θ)
    pred = predict(θ)
    l = predict(θ)  - sol
    return sum(abs2, l), pred # Mean squared error
end

l,pred   = loss(ps)
size(pred), size(sol), size(t) # Checking sizes

Data   = Iterators.repeated((), nIters) # Creating an iterator
LOSS  = []                              # Loss accumulator
PRED  = []                              # prediction accumulator
PARS  = []                              # parameters accumulator

cb = function (θ,l,pred) #callback function to observe training
  display(l)
  append!(PRED, [pred])
  append!(LOSS, l)
  append!(PARS, [θ])
  false
end

cb(ps,loss(ps)...) # Testing callback function

# Let see prediction vs. Truth
scatter(sol[:,end], label="Truth", size=(800,500))
plot!(PRED[end][:,end], lw=2, label="Prediction")

res = DiffEqFlux.sciml_train(loss, ps, ADAM(0.01), cb = cb, maxiters = 100)  # Let check gradient propagation
ps = res.minimizer
res = DiffEqFlux.sciml_train(loss, ps, BFGS(), cb = cb, maxiters = 100)  # Let check gradient propagation
@show res.minimizer # returns [0.999999999999975, 1.0000000000000213]

Is this code done or is there something more to add? Should I make a new markdown file?

ChrisRackauckas commented 4 years ago

Yeah it should just work. It just needs the comments cleaned up and some explanations

ChrisRackauckas commented 4 years ago

An example with complex numbers from @seadra:

using DiffEqSensitivity, OrdinaryDiffEq, Zygote, LinearAlgebra, FiniteDiff, Test, DiffEqFlux, Optim

const T = 10.0;
const ω = π/T;

function f(u, p, t)
    a = p[1]*sin(ω*t) + p[2]*sin(2*ω*t) + p[3]*sin(3*ω*t) + p[4]*sin(4*ω*t);
    A = [1.0 a; a -1.0];
    return -im*A*u;
end

u0 = [Complex{Float64}(1) 0; 0 1];

tspan = (0.0, T)
p = Complex{Float64}[0.7, 1.0, 3.0, 1.0]

prob_ode = ODEProblem{false}(f, u0, tspan, p);
sol_ode = solve(prob_ode, Tsit5());

utarget = [Complex{Float64}(0) im; im 0];

function predict_adjoint(p)
  return solve(prob_ode, Tsit5(), p=p, abstol=1e-12, reltol=1e-12)
end

function loss_adjoint(p)
    prediction = predict_adjoint(p)
    usol = last(prediction)
    loss = 1.0 - abs(tr(usol*utarget')/2)^2
    return loss
end

grad1 = Zygote.gradient(loss_adjoint,Complex{Float64}[1.5, 1.0, 3.0, 1.0])[1]
grad2 = FiniteDiff.finite_difference_gradient(loss_adjoint,Complex{Float64}[1.5, 1.0, 3.0, 1.0])

@test grad1 ≈ grad2 # <- succeeds

res = DiffEqFlux.sciml_train(loss_adjoint, p, BFGS(initial_stepnorm = 0.0001)) # <- fails
res.minimizer # Complex{Float64}[0.3601285857150311 + 0.0im, 0.9525963326731084 + 0.0im, 3.2163299056173216 + 0.0im, 0.9696107069145927 + 0.0im]
loss_adjoint(res.minimizer) # 1.2534417948018017e-12

ChrisRackauckas commented 4 years ago

I think in general this is done, and specifics should get an issue. Thanks everyone!

SciML / DiffEqFlux.jl

Flesh out documentation #244