JackKelly / neuralnilm_prototype

MIT License
50 stars 22 forks source link

Truncated back prop #47

Closed JackKelly closed 9 years ago

JackKelly commented 9 years ago

Use small sequences (100). See page 9 in Graves 2014.

This looks like a good tutorial on backprop through time (BPTT) which I think is what Graves uses? The tutorial has a brief mention of truncating the gradient, although I think Graves does it in a different way.

Janczak. Identification of Nonlinear Systems Using Neural Networks and Polynomial Models. 2005 might also have some useful info

Does skaee mention this in his code or papers?

Does Graves do it on "micro batches" of 100 or does he do a sliding window? Reread Graves 2014. Check his book. Check the PhD thesis he cites. Check his code.

Perhaps it's as simple as splitting data into batches and have an option to allow activations to persist between sequences or between batches.

JackKelly commented 9 years ago

Amazing. craffel was working on this just a few days ago and is now implemented (just passing truncate_gradient to scan