Unable to Take Gradients with Batch Dimension

jessebett commented 5 years ago

Here is an 1D problem where the model is trying to learn the function f(u)=u.^3. The data is a (1,200)-dimensional array, where 200 is the batch size.

using Flux, DiffEqFlux, OrdinaryDiffEq, StatsBase, RecursiveArrayTools
using Distributions

const tspan = (0.0f0,25f0)
const RANGE = (Float32(-8.),Float32(8.))
const BS = 200

target(u) = u.^3

function gen_data(batchsize,target_fun)
    x = Float32.(rand(Uniform(RANGE...,),batchsize))'|>collect
    return x,target(x)
end

dudt = Chain(Dense(1,20,tanh),Dense(20,20,tanh),Dense(20,1))
function n_ode(u0)
    neural_ode(dudt,u0,tspan,Tsit5(),save_start=false,saveat=[tspan[2]],reltol=1e-5,abstol=1e-12)
end
ps = Flux.params(dudt)

loss(x,y) = mean(abs.(n_ode(x)-y))

data = Iterators.repeated(gen_data(BS,target), 10000) 
opt = ADAM(0.001)
cb = function () #callback function to observe training
    tx,ty = gen_data(BS,target)
    display(loss(tx,ty))
end

# Display the ODE with the initial parameter values.
cb()

Flux.train!(loss, ps, data, opt, cb = cb)

The cb() shows that this is able to do a forward pass, however the reverse pass returns the error:

ERROR: MethodError: no method matching DiffEqSensitivity.ODEAdjointSensitivityFunction
... <- This method error is huge
You might have used a 2d row vector where a 1d column vector was required.
Note the difference between 1d column vector [1,2,3] and 2d row vector [1 2 3].
You can convert to a column vector with the vec() function.

This is the standard way Flux expects batched data. For example if the model was just simply the dudt chain, and not integrating that with a solver, i.e. loss(x,y) = mean(abs.(dudt(x)-y)) this works and trains fine.

Additionally, if I try neural_ode_rd instead even the forward pass (cb()) won't work and it will return the error:

ERROR: MethodError: no method matching Array{Float32,1}(::Array{Float32,2})

You might have used a 2d row vector where a 1d column vector was required.
Note the difference between 1d column vector [1,2,3] and 2d row vector [1 2 3].
You can convert to a column vector with the vec() function.

taliesinb commented 5 years ago

I see the following error which appears to be batch related (and only happens with neural_ode, not neural_ode_rd):

using Flux, DiffEqFlux, DifferentialEquations
using Flux: mse
using Base.Iterators: repeated

struct NeuralODE
    layer
    time :: Float32
end

Flux.@treelike NeuralODE

# this is to remove the saveat dimension
squeeze_final(s) = dropdims(s, dims = ndims(s))

(m::NeuralODE)(x) = begin
    squeeze_final(neural_ode(m.layer, x, (Float32(0.0), m.time), Tsit5(), saveat=[m.time], dt=0.05))
end

model = Chain(
  Dense(2, 4, tanh),
  NeuralODE(Dense(4, 4, tanh), Float32(1.0)),
  Dense(4, 2));

function train!(model, data_size)
    X = rand(Float32, 2, data_size)
    Y = rand(Float32, 2, data_size)
    loss(x, y) = mse(model(x), y)
    Flux.train!(loss, params(model), repeated((X, Y), 10), ADAM());
end

print("training with batch size 1\n")
train!(model, 1) # batchsize 1: this works

print("training with batch size 2\n")
train!(model, 2) # batchsize 2: does not work!

print("done\n")

The stacktrace begins:

training with batch size 1
training with batch size 2
ERROR: LoadError: DimensionMismatch("array could not be broadcast to match destination")
Stacktrace:
 [1] check_broadcast_shape at ./broadcast.jl:456 [inlined]
 [2] check_broadcast_axes at ./broadcast.jl:459 [inlined]
 [3] instantiate at ./broadcast.jl:258 [inlined]
 [4] materialize!(::SubArray{Float32,1,Array{Float32,1},Tuple{UnitRange{Int64}},true}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(-),Tuple{SubArray{Fl

ChrisRackauckas commented 5 years ago

What are the dimensions on the output of the neural ODE here, and what are you expecting it to be?

taliesinb commented 5 years ago

For evaluation, it's the same for both neural_ode and neural_ode_rd, which is that the NeuralODE layer takes a vector, evolves it for the given time, and returns the final evolved vector. So it's a map between (statesize, batchsize) to (statesize, batchsize). We can verify this is what is happening:

layer = NeuralODE(Dense(4, 4, tanh), Float32(1.0))
print(size(layer(rand(Float32, 4, 16)))) # prints (4, 16)

For training, something bad is happening, I just don't know what. neural_ode works, neural_ode_rd doesn't work. (I thought they were supposed to be interchangeable, with different emphases on performance.)

ChrisRackauckas commented 5 years ago

Fixed by https://github.com/JuliaDiffEq/DiffEqFlux.jl/pull/38

SciML / DiffEqFlux.jl

Unable to Take Gradients with Batch Dimension #19