IDSIA / brainstorm

Fast, flexible and fun neural networks.
1.3k stars 154 forks source link

Truncated BPTT #73

Open jramapuram opened 8 years ago

jramapuram commented 8 years ago

Is it possible to do truncated BPTT currently? I have a really long time series: 1411889 samples This overflows when trying to train on any backend.

Qwlouse commented 8 years ago

We don't have specific support for truncated BPTT currently. What you can do is to chunk up your sequence and just treat them as separate sequences. That will loose the internal state between chunks but at least allow you to train. But carrying the internal state precisely for usecase is on our agenda (see #57).

jramapuram commented 8 years ago

Yea, I had considered chunking, however as you mentioned the cross-sequence context is lost. I.e. in turn we prevent learning truly 'long-term' dependencies. Looking forward to your solution to #57

jramapuram commented 8 years ago

Any news with this?

flukeskywalker commented 8 years ago

We realized that this was not needed for our current experiments, and so we wouldn't be able to properly test it etc. We didn't finalize how this should be cleanly integrated with everything else, but the lower-level stuff necessary to get and restore context is in place so it should be possible to write a custom SgdStepper which restores context across forward passes with the help of network.get_context()