Open AndreiBarsan opened 8 years ago
Extra note to self: fewer hidden units and more depth makes more sense than many hidden units but few layers. Something like 5 x 256 is worth considering, and 5 x 512 (what I tried out on AWS) would be overkill. 5 x 128 would not be too far-fetched either.
We could also drop the fixed embedding-sized layer too. It doesn't make much sense.
It would also make sense to experiment with a rule for increasing/decreasing layer size with depth. So having 64 -> 128 -> 256 -> 256 hidden units would also be interesting.
Dropout is very important! Using 0.25 dropout is much too little, and 0.5 doesn't seem like overkill either. 0.75 (keep_prob = 0.25) might not be very far-fetched for a deeper net either!
According to a recent paper on LSTM sentiment analysis, using a bidirectional RNN instead of a regular ("one-way") one might lead to improved accuracy.
I will experiment with this after the deadline.