Bidirectional LSTM - Githubissues

Extra note to self: fewer hidden units and more depth makes more sense than many hidden units but few layers. Something like 5 x 256 is worth considering, and 5 x 512 (what I tried out on AWS) would be overkill. 5 x 128 would not be too far-fetched either.

We could also drop the fixed embedding-sized layer too. It doesn't make much sense.

It would also make sense to experiment with a rule for increasing/decreasing layer size with depth. So having 64 -> 128 -> 256 -> 256 hidden units would also be interesting.

Dropout is very important! Using 0.25 dropout is much too little, and 0.5 doesn't seem like overkill either. 0.75 (keep_prob = 0.25) might not be very far-fetched for a deeper net either!

bernhard2202 / twitter-sentiment-analysis

Bidirectional LSTM #28