Open eileen-bluerose opened 5 years ago
Hi Celina,
This one unfortunately takes a long time.
There are a few reasons for this, the biggest of which is minibatching. For convolution NN (which was the first proving ground for grenade), almost all non-trivial computations are matrix matrix multiplications, so minibatching doesn't give a huge computational benefit, and I left it out for semantic clarity; for LSTMs however, it's mostly matrix vector multiplications, which become matrix matrix with minibatching. This makes a big difference actually, as a 50 column wide matrix takes only about 5 times as much as a matrix vector op with decent BLAS libraries.
The other two things I have done differently are not propagating the previous batch's input vector (to line them up for truncated back-prop through time) and using only SGD with momentum instead of ADAM or some other optimiser.
If memory serves, I trained for over 24 hours, slowly ramping up the training size, starting with about 15 characters until spaces were interspersed well, then working up to 25 and 50 when words and basic grammar appeared.
Huw
I ran shakespeare example overnight and got some awkward results on both hackage release packet: shakespeare_output_from_clean_cabal_installation.txt and also from freshly downloaded source code: shakespeare_output_from_github_source.txt
I used the training data proposed in the source code: https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
How should I run the example to get this code to work and produce more realistic output (like the generated sequence from example)?