SciML / DiffEqFlux.jl

Pre-built implicit layer architectures with O(1) backprop, GPUs, and stiff+non-stiff DE solvers, demonstrating scientific machine learning (SciML) and physics-informed machine learning methods
https://docs.sciml.ai/DiffEqFlux/stable
MIT License
871 stars 157 forks source link

Do Multiple Shooting #481

Closed Manas2030 closed 3 years ago

Manas2030 commented 3 years ago

Training NN by splitting the datapoints and training on smaller intervals, to avoid being stuck in local minima while training.

Function I tried to train NN to predict values according to function, f(x) = xsin(3x) Here is the code I used to get my observations:

using DiffEqFlux, OrdinaryDiffEq, Flux, Optim, Plots

u0 = Float32[0.0]
datasize = 50

tspan = (0.0f0,10.0f0)
tsteps = range(tspan[1],tspan[2],length = datasize)

ode_data = tsteps .* sin.( 3 .* tsteps)

dudt2 = FastChain((x, p) -> x.^3,
                  FastDense(1, 16, tanh),
                  FastDense(16, 1))
prob_neuralode = NeuralODE(dudt2, tspan, Vern7(), saveat = tsteps, abstol=1e-6, reltol=1e-6)

function predict_neuralode(p)
  Array(prob_neuralode(u0, p))
end

# x = number of groups we are dividing the dataset into
function loss_neuralode(p)
           pred = predict_neuralode(p)
           tot_loss = 0
           x = 5
           for i in 1:x
               loss = sum(abs, (ode_data[Int((i-1)*(datasize/x)+1):Int(i*(datasize/x))] .- pred[Int((i-1)*(datasize/x)+1):Int(i*(datasize/x))]))
               tot_loss += loss*loss
           end
           return tot_loss, pred
       end

iter = 0
callback = function (p, l, pred; doplot = true)
  global iter
  iter += 1

  display(l)
  if doplot
    # plot current prediction against data
    plt = scatter(tsteps[:], ode_data[:], label = "data")
    scatter!(plt, tsteps[:], pred[:], label = "prediction")
    display(plot(plt))
  end

  return false
end

result_neuralode = DiffEqFlux.sciml_train(loss_neuralode, prob_neuralode.p, ADAM(0.05), cb = callback, maxiters = 300)

callback(result_neuralode.minimizer,loss_neuralode(result_neuralode.minimizer)...;doplot=true)

function final_loss(p)
           return sum(abs2, (ode_data .- prob_neuralode(u0, p)))
end

print("Final Loss:",final_loss(result_neuralode.minimizer),"\n")

Observations: Training on the whole dataset: NN Architecture and hyperparameters used:

dudt2 = FastChain((x, p) -> x.^3,
                         FastDense(1, 16, tanh),
                         FastDense(16, 1))
prob_neuralode = NeuralODE(dudt2, tspan, Vern7(), saveat = tsteps, abstol=1e-6, reltol=1e-6)

Loss function:

function loss_neuralode(p)
           pred = predict_neuralode(p)
           loss = sum(abs2, (ode_data .- pred))
           return loss, pred
end

Iterations = 300, Loss = 43845.42f0

Taking a bigger NN, that is-

dudt2 = FastChain((x, p) -> x.^3,
    FastDense(1, 32, tanh),
    FastDense(32,32,tanh),
    FastDense(32, 1))

Iterations = 300, Loss = 43836.41f0

Splitting the datapoints and training on smaller intervals

Going back to old NN which is

dudt2 = FastChain((x, p) -> x.^3,
            FastDense(1, 16, tanh),
            FastDense(16, 1))

And taking a new Loss function (that groups different points):

# x = number of groups we are dividing the dataset into
function loss_neuralode(p)
           pred = predict_neuralode(p)
           tot_loss = 0
           x = 5
           for i in 1:x
               loss = sum(abs, (ode_data[Int((i-1)*(datasize/x)+1):Int(i*(datasize/x))] .- pred[Int((i-1)*(datasize/x)+1):Int(i*(datasize/x))]))
               tot_loss += loss*loss
           end
           return tot_loss, pred
       end

Observations- With x = 2, loss = 43907.97f0 With x = 5, loss = 44291.656f0 With x = 10, loss = 43852.062f0

Replacing "tot_loss += loss*loss" with tot_loss += log(cosh(loss)) With x = 2, doesn't work (producing NaN) With x = 5, loss = 43862.867f0 With x = 10, loss = 43873.414f0

Training on the first 30 points of the dataset

Taking, (datasize = 30), since that will be easier for NN to learn

Training on the whole dataset: Loss = 5765.231f0

Dividing dataset into groups as before:

With x = 2, loss = 5954.032f0 With x = 5, loss = 5966.1304f0 With x = 10, loss = 6044.2563f0

Again replacing "tot_loss += loss*loss" with tot_loss += log(cosh(loss))

With x = 2, loss = 5941.9297f0 With x = 5, loss = 6614.8916f0 With x = 10, loss = 6424.8003f0

Manas2030 commented 3 years ago

Here are some screenshots for reference: Training on the whole dataset, without making any groups training on whole dataset Training on whole dataset but dividing them into 5 groups training with x = 5 Training on the first 30 points and again dividing them into 5 groups x=5, smaller data set

ChrisRackauckas commented 3 years ago
function loss_neuralode(p)
           pred = predict_neuralode(p)
           tot_loss = 0
           x = 5
           for i in 1:x
               loss = sum(abs, (ode_data[Int((i-1)*(datasize/x)+1):Int(i*(datasize/x))] .- pred[Int((i-1)*(datasize/x)+1):Int(i*(datasize/x))]))
               tot_loss += loss*loss
           end
           return tot_loss, pred
       end

These aren't doing multiple shooting. pred = predict_neuralode(p) is the essence of single shooting. It needs to make predictions on the subintervals. You're still only doing a single solve.

Manas2030 commented 3 years ago

I tried implementing the loss_neuralode function like this:

function predict_neuralode(u_initial,starting_timestep,p)
    Array(prob_neuralode(u_initial, p))
end

function loss_neuralode(p)
    tot_loss = 0
    #dividing the dataset in groups of 10 i.e. 1-10, 11-20, ... , 41-50
    for i in [10,20,30,40,50]
        pred = predict_neuralode(Float32[ode_data[i-9]],tsteps[i-9],p)
        loss = sum(abs, (ode_data[i-9:i] .- pred[1:10]))
        tot_loss += loss*loss
    end
    pred = predict_neuralode(u0,tsteps[1],p)
    return tot_loss, pred
end

To do multiple shooting, I am sending the first element of the group which I divided the dataset into (u_initial), timestep of that u_initial (starting_timestep) and parameters of NN (p). But i cannot figure out how to use the parameter 'starting_timestep' as the timestep from which the NN should start predicting in function predict_neuralode. The function prob_neuralode() seems to always start predicting from timestep 0.0f

ChrisRackauckas commented 3 years ago

Use the direct interface, i.e.

dudt2 = FastChain((x, p) -> x.^3,
                  FastDense(2, 50, tanh),
                  FastDense(50, 2))
neural_ode_f(u,p,t) = dudt2(u,p)
pinit = initial_params(dudt2)
prob = ODEProblem(neural_ode_f, u0, tspan, pinit)

See the bottom of https://diffeqflux.sciml.ai/dev/examples/neural_ode_sciml/. That will give you more control. We can then find out how to change the NeuralODE struct. It will need to be changed in order to do this, but I kind of think it would be good to find out not only that, but a better interface for multiple shooting.

Manas2030 commented 3 years ago

These are the functions I have modified

function predict_neuralode(u_initial, timesteps, p)
         tmp_prob = remake(prob, p=p, tspan=(timesteps[1],timesteps[length(timesteps)]), u0=u_initial)
         tmpq = Array(solve(tmp_prob, Tsit5(), saveat=timesteps))
         return tmpq
end
function loss_neuralode(p)
    tot_loss = 0
    # dividing the dataset in groups of 10 i.e. 1-10, 11-20, ... 
    # grp_size = 10, datasize = 20
    for i in grp_size:grp_size:datasize
        pred = predict_neuralode(Float32[ode_data[i-grp_size+1]],tsteps[i-grp_size+1:i],p)
        loss = sum(abs2, (ode_data[i-grp_size+1:i] .- pred[1:grp_size]))
        tot_loss += (loss*loss) + (loss_multiplier * abs(ode_data[i-grp_size+1]-pred[1]))
    end
    pred = predict_neuralode(u0,tsteps,p)
    return tot_loss, pred
end

These are the predictions I got from the NN in first iteration- For first 10 points (1st group):

Float32[0.0, 0.013333219, 0.026126323, 0.038400874, 0.050177608, 0.061476488, 0.07231665, 0.08271681, 0.09269449, 0.1022668]

For points from 11 to 20 (2nd group):

Float32[0.52629834, 0.5184088, 0.51083046, 0.5035515, 0.49656034, 0.48984596, 0.4833976, 0.477205, 0.4712582, 0.46554756]

When initial time is given from t=tsteps[11] itself in the predict_neuralode function (to do multiple shooting), prediction is 0.52629834. But if I take my initial time = tsteps[0] then my prediction is different for 11th timestep (as shown in the graph).

tmp

So we need to figure out a way so that we can pass certain timestep and initial conditions for that timestep as well to our NN and get predictions. But I couldn't find any functions relevant to this or interfaces that can help us.

ChrisRackauckas commented 3 years ago

When initial time is given from t=tsteps[11] itself in the predict_neuralode function (to do multiple shooting), prediction is 0.52629834. But if I take my initial time = tsteps[0] then my prediction is different for 11th timestep (as shown in the graph).

Indeed, that continuity will only hold after the loss function is minimized, and if there is a term forcing continuity. How large is loss_multiplier ? This might be dominated by the squared term.

Manas2030 commented 3 years ago

I tried loss_multiplier on several values like 0, 10, 100, 10000. But it doesn't work because when I call the predict_neuralode in second iteration of for i in grp_size:grp_size:datasize , the parameters are (ode_data[11],tsteps[11],p). And when the problem is remade inside the predict_neuralode with passed parameters as the initial conditions, the prediction at t = 0 for the new problem (which is t = 11 for our original problem) is always 0.52629834 (So tmpq[0] = ode_data[11] = 0.52629834). So inside the loss_neuralode no matter how big is loss_multiplier, abs(ode_data[11]-pred[1])) is always = 0. Therefore loss_multiplier is not doing anything.

ChrisRackauckas commented 3 years ago

Oh that term is incorrect. It should be abs(ode_data[i+grp_size]-pred[end])). The end of the prediction should equation the initial condition of the next. That term is only important when you start to make the initial conditions along the trajectory parameters themselves though.

Manas2030 commented 3 years ago

I changed the grp_size to 5 and this is the new function:

function loss_neuralode(p)
    tot_loss = 0
    for i in grp_size:grp_size:datasize
        pred = predict_neuralode(Float32[ode_data[i-grp_size+1]],tsteps[i-grp_size+1:i],p)
        loss = sum(abs2, (ode_data[i-grp_size+1:i] .- pred[1:grp_size]))
        if(i == datasize)
            tot_loss += (loss*loss)
        else
            tot_loss += (loss*loss) + (loss_multiplier * abs(ode_data[i+ grp_size]-pred[grp_size]))
        end
    end
    pred = predict_neuralode(u0,tsteps,p)
    return tot_loss, pred
end

Changing loss_multiplier didn't have much of an effect and this is the output I am getting.

loss_multiplier = 100 Also if I do abs(ode_data[i+ grp_size]-pred[grp_size]) wouldn't it calculate distance between first point of next group and prediction of last point of current group which could be really far if we have a high slope in between?

ChrisRackauckas commented 3 years ago

Also if I do abs(ode_data[i+ grp_size]-pred[grp_size]) wouldn't it calculate distance between first point of next group and prediction of last point of current group which could be really far if we have a high slope in between?

They should overlap at a point.

ChrisRackauckas commented 3 years ago

Take it slow: if there's one group does it work? 2?

Manas2030 commented 3 years ago

These are the predictions: with one group [grp_size = 20]

grp_size = 20

With 2 grps [grp_size = 10] grp_size = 10

Also I tested it on 5 groups since with only 2 grps the 2nd grp will go in the if block if(i == datasize) tot_loss += (loss*loss) and the term (loss_multiplier * abs(ode_data[i+ grp_size]-pred[grp_size])) wouldn't be added.

ChrisRackauckas commented 3 years ago

wait, why are you indexing the ODE data linearly when it's a matrix?

Manas2030 commented 3 years ago

The ode_data is of type Array{Float32,1} so I have been accessing it linearly like a normal 1D array. I didn't quite get what you meant šŸ˜…

ChrisRackauckas commented 3 years ago

Oh I see. Does this example solve fine "the normal way"?

Manas2030 commented 3 years ago

No it still gets stuck in a local minima. The predictions are very very similar to when we take grp_size = 20 (the whole dataset). There's a picture up there too with the same grp_size.

ChrisRackauckas commented 3 years ago

Why is there an (x, p) -> x.^3, in front if you're trying to fit f(x) = xsin(3x)? That doesn't seem like a good idea.

I would suggest not attempting to write new methods on new examples: start with existing examples to debug.

Manas2030 commented 3 years ago

Hi! So I shifted my focus to this already worked problem: https://diffeqflux.sciml.ai/dev/examples/local_minima/ And this is the code I am working on:

using DiffEqFlux, OrdinaryDiffEq, Flux, Optim, Plots

u0 = Float32[2.0; 0.0]
datasize = 30
tspan = (0.0f0, 5.0f0)
tspan_mini = (0.0f0, 1.5f0)
datasize_mini = 10
tsteps = range(tspan[1], tspan[2], length = datasize)

function trueODEfunc(du, u, p, t)
    true_A = [-0.1 2.0; -2.0 -0.1]
    du .= ((u.^3)'true_A)'
end

prob_trueode = ODEProblem(trueODEfunc, u0, tspan)
ode_data = Array(solve(prob_trueode, Tsit5(), saveat = tsteps))

iter = 0
grp_size = 5
loss_multiplier = 1000

dudt2 = FastChain((x, p) -> x.^3,
                  FastDense(2, 16, tanh),
                  FastDense(16, 2))
neural_ode_f(u,p,t) = dudt2(u,p)
pinit = initial_params(dudt2)
prob = ODEProblem(neural_ode_f, u0, tspan_mini, pinit)
prob_neuralode = NeuralODE(dudt2, (0.0,1.6), Tsit5(), saveat = tsteps[tsteps .<= 1.6])

function predict_neuralode(u_initial, timesteps, p)
         tmp_prob = remake(prob, p=p, tspan=(timesteps[1],timesteps[length(timesteps)]), u0=u_initial)
         tmpq = Array(solve(tmp_prob, Tsit5(), saveat=timesteps))
         return tmpq
end

function loss_neuralode(p)
  pred = predict_neuralode(u0,tsteps[tsteps .<= 1.6],p)
  loss = sum(abs2, (ode_data[:,1:size(pred,2)] .- pred))
  return loss, pred
end

iter = 0
callback = function (p, l, pred; doplot = true)
  global iter
  iter += 1

  display(l)
  if doplot
    # plot current prediction against data
    plt = scatter(tsteps[1:size(pred,2)], ode_data[1,1:size(pred,2)], label = "data")
    scatter!(plt, tsteps[1:size(pred,2)], pred[1,:], label = string("pred: ",iter))
    display(plot(plt))
  end
  return false
end

iter = 0
result_neuralode = DiffEqFlux.sciml_train(loss_neuralode, prob_neuralode.p,
                                          ADAM(0.05), cb = callback,
                                          maxiters = 300)

function final_loss(p)
  return sum(abs2, (ode_data[:,1:10] .- Array(prob_neuralode(u0, p)) ))
end

print("Final Loss: ",final_loss(result_neuralode.minimizer),"\n")

Here is the result with single shooting: image Loss: 1.4008017

For multiple shooting, here is the loss_neuralode function:

function loss_neuralode(p)
    tot_loss = 0
    for i in grp_size:grp_size:datasize_mini
        pred = predict_neuralode(ode_data[:,i-grp_size+1],tsteps[i-grp_size+1:i],p)
        loss = sum(abs2, (ode_data[:,i-grp_size+1:i] .- pred[:,1:grp_size]))
        if(i == datasize)
            tot_loss += (loss*loss)
        else
            tot_loss += (loss*loss) + (loss_multiplier * sum(abs,(ode_data[:,i+ grp_size]-pred[:,grp_size])))
        end
    end
    pred = predict_neuralode(u0,tsteps[tsteps .<= 1.6],p)
    return tot_loss, pred
end

Here is the result with this function: image Loss: 44.763756

Replacing the difference term ode_data[:,i+ grp_size]-pred[:,grp_size] with ode_data[:,i+grp_size+1] - pred[:,1] and also removing the if condition, this is the result: image Loss: 2.431281

So I think the second difference: ode_data[:,i+grp_size+1] - pred[:,1] does better at predicting?

ChrisRackauckas commented 3 years ago

If you run it longer does it converge? If you make each group one data point does it converge?

Manas2030 commented 3 years ago

When the difference term is ode_data[:,i+ grp_size]-pred[:,grp_size] and when we increase the number of iterations, it does try to converge but something is keeping itself from converging at the 5th point (last point of the first group). Here is convergence for 1500 iterations. 1500 iterations For any number of iterations, the 5th point stays at the same position.

If each group has one data point, then the predictions do not change at all.

ChrisRackauckas commented 3 years ago

yes look for an off-by-one error.

Manas2030 commented 3 years ago

Hi, I tried correcting the term and there is improvement over single shooting methods as in the screenshots below:

recordings1 recordings0 But it still gets stuck in a local minima for the different function y = x*sin(3x) or when the datasize is too big.

This is how I am calculating loss:

    tot_loss = 0
    for i in grp_size:grp_size:datasize_mini
        pred = predict_neuralode(ode_data[:,i-grp_size+1],tsteps[i-grp_size+1:i],p)
        loss = sum(abs2, (ode_data[:,i-grp_size+1:i] .- pred[:,1:grp_size]))
        tot_loss += (loss*loss) + (loss_multiplier * sum(abs,(ode_data[:,i-grp_size+1] - pred[:,1])))
ChrisRackauckas commented 3 years ago

What about grouping one data point per group in the spiral neural ODE example? I know that works, so if that doesn't work with your form then there's a bug.

Manas2030 commented 3 years ago

Hi, I tried overlapping the groups. But when grouping one data point per group, I am not sure how losses will be calculated since there is no overlapping points for two adjacent groups? So for now I took a point and tried to predict the immediate next point.

function loss_neuralode(p)
    tot_loss = 0
    if(grp_size == 1)
        for i in 1:datasize_mini-1
            pred = predict_neuralode(ode_data[:,i],Array(tsteps[i:i+grp_size]),p)
            loss = sum(abs2, (ode_data[:,i:i+grp_size] .- pred[:,1:1+grp_size]))
            tot_loss += (loss*loss) + (loss_multiplier * sum(abs,(ode_data[:,i+ grp_size]-pred[:,1+grp_size])))
        end
        pred = predict_neuralode(u0, tsteps[tsteps .<= mini_timestep_value], p)
        return tot_loss, pred
    end
    for i in 1:grp_size-1:datasize_mini-grp_size
        pred = predict_neuralode(ode_data[:,i],tsteps[i:i+grp_size-1],p)
        loss = sum(abs2, (ode_data[:,i:i+grp_size-1] .- pred[:,1:grp_size]))

        tot_loss += (loss*loss) + (loss_multiplier * sum(abs,(pred[:,grp_size] - ode_data[:,i+grp_size-1])))
    end
    pred = predict_neuralode(u0, tsteps[tsteps .<= mini_timestep_value], p)
    return tot_loss, pred
end

This decently captures the shape of the data and doesn't get stuck in local minima (for the example I am working on) obs

ChrisRackauckas commented 3 years ago

What if you do more iterations? 100 is kind of quick.

Manas2030 commented 3 years ago

It randomly oscillates like the following plots, very likely due to a large step size (as the plots are just 2-3 iterations apart) image (These are for grp_size =1, iterations = 300)

ChrisRackauckas commented 3 years ago

Start with 300 iterations of ADAM then do BFGS, like is shown in the tutorial. Post that.

Manas2030 commented 3 years ago

after lbfgs

ChrisRackauckas commented 3 years ago

That looks pretty good šŸ˜„

Manas2030 commented 3 years ago

YESS šŸ˜ But I also tried it on y= x sin(3x) for only 10 data points, it got stuck in a minima really bad. Single shooting doesn't work either.

ChrisRackauckas commented 3 years ago

The question now is, how do you start to package it so that it's easier for users to do this? A function? Or just a tutorial? What are your thoughts?

Manas2030 commented 3 years ago

I don't have a clear answer but I think making a function would make it easier for others to use it (without any bugs in indexing or dividing data into groups) and also wouldn't require much knowledge about multiple shooting but if someone implements it using the tutorial, it would give them quite some flexibility. But again, they can always implement it anyway in either case.

ChrisRackauckas commented 3 years ago

Try prototyping a design here. See if you can simplify down the user code by having something that helps calculate a multiple shooting loss.

Manas2030 commented 3 years ago

Hi Chris, sorry for the delay as I got really busy with my mid-semester exams. But here is a function that helps calculating the loss.

function loss_neuralode_helper(p :: Array, ode_data :: Array, tsteps :: Array, prob :: ODEProblem, loss_function ::Function, grp_size :: Integer = 5, loss_multiplier :: Integer = 100)
    tot_loss = 0
    if(grp_size == 1)
        for i in 1:datasize-1
            pred = Array(solve(remake(prob, p=p, tspan=(tsteps[i],tsteps[i+1]), u0=ode_data[:,i]), Tsit5(),saveat = tsteps[i:i+1]))
            tot_loss += loss_function(ode_data[:,i:i+1], pred[:,1:2]) + (loss_multiplier * sum(abs,ode_data[:,i+ 1]-pred[:,2]))
        end
        pred = predict_neuralode(u0, tsteps, p)
        return tot_loss, pred
    end

    for i in 1:grp_size-1:datasize-grp_size
        pred = solve(remake(prob, p=p, tspan=(tsteps[i],tsteps[i+grp_size-1]), u0=ode_data[:,i]), Tsit5(),saveat = tsteps[i:i+grp_size-1])
        tot_loss += loss_function(ode_data[:,i:i+grp_size-1], pred[:,1:grp_size]) + (loss_multiplier * sum(abs,pred[:,grp_size] - ode_data[:,i+grp_size-1]))
    end
    pred = predict_neuralode(u0, tsteps, p)
    return tot_loss, pred
end

This function can be called as follows:

function loss_neuralode(p)
    return loss_neuralode_helper(p, ode_data, tsteps, prob_param, loss_function_param, grp_size_param, loss_multiplier_param)
end

This should be helpful as various parameters (like grp_size, loss_function) can also be changed while training itself giving more flexibility/power to the user. What do you think about this?

ChrisRackauckas commented 3 years ago

Yeah, looks useful. Start the PR

yewalenikhil65 commented 3 years ago

I am trying this same example using multi-shooting for my clarity .. but its not getting trained successfully,, used above-mentioned loss_neuralode_helper from PR.. Anyone could help me where i am making mistake ? loss value is increasing instead

using DiffEqFlux,Flux
using GalacticOptim
using OrdinaryDiffEq
using Plots
using Plots: scatter, scatter!
gr()   # just a backend for plotting

u0 = Float32[2.0; 0.0]
datasize = 30;
batchsize = 5;
loss_multiplier= 100
tspan = (0.0f0, 5.0f0)
tsteps = range(tspan[1], tspan[2], length = datasize)

function trueODEfunc(du, u, p, t)
    true_A = [-0.1 2.0; -2.0 -0.1]
    du .= ((u.^3)'true_A)'
end
prob_trueode = ODEProblem(trueODEfunc, u0, tspan)
ode_data = Array(solve(prob_trueode, Tsit5(), saveat = tsteps))

NN = Chain(x -> x.^3, Dense(2, 16, tanh), Dense(16, 2))
p,re = Flux.destructure(NN)     # use this p as the initial condition!
dudt!(u,p,t) = re(p)(u)             # need to restrcture for backprop!
prob = ODEProblem(dudt!, u0, tspan,p)

function loss_function(ode_data, pred)
    return sum(abs2, ode_data .- pred)
end
function loss_neuralode_helper(p)
    tot_loss = 0.0f0
    if(batchsize == 1)
        for i in 1:datasize-1
            pred = Array(solve(prob,Tsit5(),u0=ode_data[:,i],
                    tspan=(tsteps[i],tsteps[i+1]),p=p,saveat = tsteps[i:i+1]))
            loss = loss_function(ode_data[:,i:i+1], pred[:,1:2])
            tot_loss += (loss*loss) + (loss_multiplier * sum(abs, ode_data[:,i+1]-pred[:,2]))
        end
        pred = Array(solve(prob, Tsit5(),u0=ode_data[:,1] ,
                    tspan = (tsteps[1],tsteps[datasize]), p=p ,saveat = tsteps))
        return tot_loss, pred
    end
    for i in 1:batchsize-1:datasize-batchsize
        pred = Array(solve(prob,Tsit5(), u0=ode_data[:,i],
            tspan=(tsteps[i],tsteps[i+batchsize-1]),p=p,saveat = tsteps[i:i+batchsize-1]))

        loss = loss_function(ode_data[:,i:i+batchsize-1], pred[:,1:batchsize])
        tot_loss += (loss*loss) + (loss_multiplier * sum(abs,pred[:,batchsize] - ode_data[:,i+batchsize-1]))
    end
    pred = Array(solve(prob, Tsit5(),u0=ode_data[:,1],
                tspan = (tsteps[1],tsteps[datasize]), p = p,saveat = tsteps))
    return tot_loss, pred
end
loss_neuralode_helper(p)  # initial loss value
callback = function (p, l, pred; doplot = true)
  display(l)
  # plot current prediction against data
  plt = scatter(tsteps, ode_data[1,:], label = "data")
  scatter!(plt, tsteps, pred[1,:], label = "prediction")
  if doplot
    display(plot(plt))
  end
  return false
end

adtype = GalacticOptim.AutoForwardDiff()
optf = GalacticOptim.OptimizationFunction((x, p) -> loss_neuralode_helper(x),adtype)
optfunc = GalacticOptim.instantiate_function(optf, prob.p, adtype, nothing)
optprob = GalacticOptim.OptimizationProblem(optfunc, prob.p)

result_neuralode = GalacticOptim.solve(optprob,ADAM(0.05),cb = callback,maxiters = 300)
Manas2030 commented 3 years ago

Hey, it looks like in the code above you forgot to solve/define ode_data. Besides that, your code is working fine, the predictions are converging but at a very slow rate. Also each iteration of training is slower using GalacticOptim.solve (atleast for me). These pics are from the loss function you defined: pic Also in the above code I changed ADAM(0.01) to ADAM(0.05) which made it a lot easier and faster to converge. In my code from PR, I used DiffEqFlux.sciml_train as in the tutorial to train which I found faster. Hope this helps

yewalenikhil65 commented 3 years ago

Hey, it looks like in the code above you forgot to solve/define ode_data.

@Manas2030 sorry.. that was copy-paste issue..edited it now. Besides that, your code is working fine, the predictions are converging but at a very slow rate. Also each iteration of training is slower using GalacticOptim.solve (atleast for me). In my code from PR, I used DiffEqFlux.sciml_train as in the tutorial to train which I found faster. Hope this helps

I hope batchsize i defined is same as grp_size that you have plotted in Actually i tried using both the ways.. solve from galacticoptim and diffeqflux.sciml_train also..with larger learning rate too(0.05).and also with more relaxed learning rates...but it never could decrease the loss function :-(

Vaibhavdixit02 commented 3 years ago

Also each iteration of training is slower using GalacticOptim.solve (atleast for me).

That sounds very unlikely, but I would like to see code to reproduce if you have it handy to investigate

yewalenikhil65 commented 3 years ago

Also each iteration of training is slower using GalacticOptim.solve (atleast for me).

That sounds very unlikely, but I would like to see code to reproduce if you have it handy to investigate

Could you please try with code i have posted couple of comments above. ? I have used same loss function provided by @Manas2030 in the PR.. And also uses Galactic optim

Manas2030 commented 3 years ago

@Vaibhavdixit02 Hi Vaibhav, you're right it turned out the step size in ADAM was 0.001 wherein I mistakenly read it as 0.01 It works just as fast on changing it. Sorry for that. Also could not remaking an ODEProblem (as in snippets below) in each iteration have added to the slowness? Just wondering

pred = Array(solve(prob,Tsit5(), u0=ode_data[:,i], tspan=(tsteps[i],tsteps[i+batchsize-1]),p=p,saveat = tsteps[i:i+batchsize-1]))

Below snippet is with remaking a problem:

pred = Array(solve(remake(prob, p = p, tspan = (tsteps[i],tsteps[i+batchsize-1]), u0 = ode_data[:,i]), Tsit5(),saveat = tsteps[i:i+batchsize-1]))

@yewalenikhil65 Hi Nikhil, could you please update/upload the exact code you're working on? Using the exact code you mentioned above, I was getting this error Tried to add a tstop that is behind the current time. This is strictly forbidden. And also gr() needs to imported. šŸ˜…

Vaibhavdixit02 commented 3 years ago

I am not sure if there's a cost associated with remake. You can experiment on your end with both alternatives and time them to check

yewalenikhil65 commented 3 years ago

@yewalenikhil65 Hi Nikhil, could you please update/upload the exact code you're working on? Using the exact code you mentioned above, I was getting this error Tried to add a tstop that is behind the current time. This is strictly forbidden. And also gr() needs to imported.

@Manas2030 , I have updated/edited in previous comment..gr() was just a backend for using Plots. you could use any backends in Plots that u like.The other error Tried to add a tstop that is behind the current time. This is strictly forbidden. I am not encountering that error if u just copy paste that code. https://github.com/SciML/DiffEqFlux.jl/issues/481#issuecomment-798972103

Manas2030 commented 3 years ago

This is very very strange, I'm getting that error. Maybe someone else could look it up once? šŸ˜•

yewalenikhil65 commented 3 years ago

I would be glad if someone takes a look at https://github.com/SciML/DiffEqFlux.jl/issues/481#issuecomment-798972103 , as its loss function is exactly same as that of multiple shooting PR. Besides, i was wondering if there is any cleaner way to split the dataset as the indices inside the for loop's were very confusing at first glance. Could DataLoader from Flux help ? Just that i don't see the data being overlapping at the end points using this approach ! Is that very much necessary condition for multiple shooting ?

train_loader= Flux.Data.DataLoader(ode_data, t,batchsize = k, partial=false)
for (x,y) in train_loader
    @show x;   # a chunk of size k of ode_data
    @show y;  # a chunk of size k of t
end