Closed junsebas97 closed 3 years ago
If you want to use RNNs, just use Flux's RNN's and then use the destructure form, as you demonstrate here. That should then be fine. Without the destructure form there's no way to have the parameter vector influence the differential equation for the adjoint. Does that make sense?
Yes, it makes sense and that is clear for me.
But I'm not sure yet about the destructure network; if it indeed works like a recurrent network. For example:
> ANN = Chain(LSTM(1, 5), Dense(5, 1));
> param, func = Flux.Destructure(ANN);
> ANN(0.5)
1×1 Array{Float32,2}:
0.1435646
> ANN(0.5)
1×1 Array{Float32,2}:
0.23851033
> ANN(0.5)
1×1 Array{Float32,2}:
0.30093935
> func(param)(0.5)
1×1 Array{Float32,2}:
0.1435646
> func(param)(0.5)
1×1 Array{Float32,2}:
0.1435646
> func(param)(0.5)
1×1 Array{Float32,2}:
0.1435646
Up you can see that the normal network ANN
receives three times the same input but always produce different outputs, due to the influence of previous states; But the destructured network func
for equal inputs always produce equal outputs, like a feed-forward network.
My question is if when I call the ODE, will func
(destructured network) work like a recurrent network or like feed-forward?
It works exactly like whatever network you deconstructed. It just rebuilds that same network with new parameters.
Ok, thank you so much!
But I am still wondering why in the upper code it doesn't work like a recurrent network? I guess it is because I made three separate calls so each call is an initialization.
Thank you so much and excuse me for being so persistent
What do you mean "it doesn't work like a recurrent network"? I'm lost which example you're pointing to.
> ANN = Chain(LSTM(1, 5), Dense(5, 1));
> param, func = Flux.Destructure(ANN);
Up there are two definitions (normal ANN
and deconstructed func
) of the same network, a recurrent one. When the same input is passed twice to ANN
it produces different outputs, due to the influence of the first input/output
> ANN(0.5)
1×1 Array{Float32,2}:
0.1435646
> ANN(0.5)
1×1 Array{Float32,2}:
0.23851033
But when I do the same process in func
no matter how many times I pass an input the previous input/output are not considered (is not recurrent)
> func(param)(0.5)
1×1 Array{Float32,2}:
0.1435646
> func(param)(0.5)
1×1 Array{Float32,2}:
0.1435646
> func(param)(0.5)
1×1 Array{Float32,2}:
0.1435646
I just checked that the state is reinitialized every time you do func(param)
. Probably likely needs to be fixed in Flux
@junsebas97 Could you test if this works for you?
mutable struct MyRecur{T}
cell::T
init
state
end
function (m::MyRecur)(xs...)
h, y = m.cell(m.state, xs...)
m.state = h
return y
end
Flux.@functor MyRecur cell, init, state
Flux.trainable(r::MyRecur) = Flux.trainable(r.cell)
c = Flux.LSTMCell(1, 1)
ANN = MyRecur(c, Flux.hidden(c), Flux.hidden(c)) # This is essentially your LSTM
If it does we can put this fix in
EDIT: Thanks to @timdkim for pointing out that destructure ends up returning state
as well. Hence this solution will not work inside NeuralODE.
Hi, I've been implemented it, but it seems to be not working (although I'm not sure if I did the correct implementation)
With the LSTM cell, you defined above :
ANN = MyRecur(c, Flux.hidden(c), Flux.hidden(c))
I use Flux.destructure
and I have the same issue, the state is reinitialized every time I do func(param)
therefore it doesn't consider the previous inputs.
> c = Flux.LSTMCell(1, 1);
> ANN = MyRecur(c, Flux.hidden(c), Flux.hidden(c));
> par, func = Flux.destructure(ANN);
> func(par).state
(Float32[0.0], Float32[0.0])
> func(par)(0.5)
1×1 Array{Float64,2}:
0.07713145256002776
> func(par).state
(Float32[0.0], Float32[0.0])
> func(par)(0.5)
1×1 Array{Float64,2}:
0.07713145256002776
So I tried to use ANN
normally (without Flux.destructure
) and I was not even able to solve the next system:
> c = Flux.LSTMCell(1, 1);
> ANN = MyRecur(c, Flux.hidden(c), Flux.hidden(c));
> dudt(u, p, t) = ANN(u);
> prob = ODEProblem(dudt, 0.0, (0.0, 10.0));
> solve(prob, TSit5(), saveat = 0.1)
MethodError: no method matching similar(::Float64, ::Type{Float64})
Sorry if I'm missing something Thank you so much!
Hi @junsebas97, the example you posted is working as expected. When you destructure ANN it has 0 as the state values. So when you do func(par)
it basically reconstructs the layer with the old parameter values.
c = Flux.LSTMCell(1, 1);
ANN = MyRecur(c, Flux.hidden(c), Flux.hidden(c));
ANN(1.0); ANN.state
par, func = Flux.destructure(ANN);
m = func(par)
m.state # Same value ass ANN.state
m(2.0)
par, func = Flux.destructure(m); # I need to destructure again for the new state to be reflected
Regarding the ODEProblem I need to look a bit more into it
I'm finding that the values in par
change when the state changes even though we are not updating any parameters. Is this expected?
Thanks for pointing out. Flux.destructure
returns state
as well (which is the changing). I will have to look into how to fix it.
Wait, this state, are you trying to preserve it between f
calls?
Yes. The OP wanted to preserve state in the f
calls across time and reset once we reevaluate from an initial condition
That doesn't make all that much sense. ODEs don't solve in a way that monotonically increases in time, so sharing state might not have the behavior you'd expect. Having extra state would need to be stored in additional ODEs.
Yes, I agree in this particular context it doesn't make a lot of sense, but there is definitely a bug on Flux's side, since doing re(p)
I would expect to get back the same model I destructured.
Yup, so this should probably get an appropriate flux issue and close here.
Moving the discussion over to FluxML/Flux.jl#1329
Hello, first of all, thanks for this package is very useful
I'm new in the scientific machine learning field and currently, I'm learning, and very interested, in the UDEs to model physical systems. So far I've incorporated just multilayer perceptrons into the ODEs, getting good results. However, I tried to incorporate recurrent networks in order to increase the capability of these models, but the training with
Flux.train!
crash.For example, the next model uses 5 LSTM cells in its definition:
As
Chain
was used instead ofFastChain
, the training is withFlux.train!
, but it fails;ERROR: LoadError: MethodError: no method matching similar(::DiffEqBase.NullParameters)
I think it is because Julia is able to recognize the weights of the network as parameters of
ANN
but no ofmodel(du, u, p, t)
:So to avoid this error, I used the instead
Flux.destructure
to define the recurrent network:In this way, I was able to train with
Flux.train!
andDiffEqFlux.sciml_train
and no errors appear. But, the workflow of this new network is not really clear for me, because this "Destructured" network seems to be not considering the previous states like the "normal" network:I'd be very grateful if somebody could clarify the work of
fun(u , par)
(if it's truly recurrent or it works like multilayer perceptron?) or could you tell me how to correctly train the UDEs with recurrent networks?