Closed jessebett closed 5 years ago
I see the following error which appears to be batch related (and only happens with neural_ode
, not neural_ode_rd
):
using Flux, DiffEqFlux, DifferentialEquations
using Flux: mse
using Base.Iterators: repeated
struct NeuralODE
layer
time :: Float32
end
Flux.@treelike NeuralODE
# this is to remove the saveat dimension
squeeze_final(s) = dropdims(s, dims = ndims(s))
(m::NeuralODE)(x) = begin
squeeze_final(neural_ode(m.layer, x, (Float32(0.0), m.time), Tsit5(), saveat=[m.time], dt=0.05))
end
model = Chain(
Dense(2, 4, tanh),
NeuralODE(Dense(4, 4, tanh), Float32(1.0)),
Dense(4, 2));
function train!(model, data_size)
X = rand(Float32, 2, data_size)
Y = rand(Float32, 2, data_size)
loss(x, y) = mse(model(x), y)
Flux.train!(loss, params(model), repeated((X, Y), 10), ADAM());
end
print("training with batch size 1\n")
train!(model, 1) # batchsize 1: this works
print("training with batch size 2\n")
train!(model, 2) # batchsize 2: does not work!
print("done\n")
The stacktrace begins:
training with batch size 1
training with batch size 2
ERROR: LoadError: DimensionMismatch("array could not be broadcast to match destination")
Stacktrace:
[1] check_broadcast_shape at ./broadcast.jl:456 [inlined]
[2] check_broadcast_axes at ./broadcast.jl:459 [inlined]
[3] instantiate at ./broadcast.jl:258 [inlined]
[4] materialize!(::SubArray{Float32,1,Array{Float32,1},Tuple{UnitRange{Int64}},true}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(-),Tuple{SubArray{Fl
What are the dimensions on the output of the neural ODE here, and what are you expecting it to be?
For evaluation, it's the same for both neural_ode
and neural_ode_rd
, which is that the NeuralODE
layer takes a vector, evolves it for the given time, and returns the final evolved vector. So it's a map between (statesize, batchsize)
to (statesize, batchsize)
. We can verify this is what is happening:
layer = NeuralODE(Dense(4, 4, tanh), Float32(1.0))
print(size(layer(rand(Float32, 4, 16)))) # prints (4, 16)
For training, something bad is happening, I just don't know what. neural_ode
works, neural_ode_rd
doesn't work. (I thought they were supposed to be interchangeable, with different emphases on performance.)
Here is an 1D problem where the model is trying to learn the function
f(u)=u.^3
. The data is a(1,200)
-dimensional array, where 200 is the batch size.The
cb()
shows that this is able to do a forward pass, however the reverse pass returns the error:This is the standard way Flux expects batched data. For example if the model was just simply the
dudt
chain, and not integrating that with a solver, i.e.loss(x,y) = mean(abs.(dudt(x)-y))
this works and trains fine.Additionally, if I try
neural_ode_rd
instead even the forward pass (cb()
) won't work and it will return the error: