Can not converge with LSTM

zbzhu commented 6 years ago

Hi

I try to run the addition_test.py. However, I find that the MSE_loss can not converge to 0 with LSTM, and by using indRNN, it needs about 100 steps to converge.

I wonder why do this happen? I attemp to change the code for a better result, but they all failed.

Thank you.

zbzhu commented 6 years ago

1, The Lstm code in addition_test.py has a problem. The customed indRNN need inputs with dimension [batch_size, time_steps, dim]. However, the default inputs format of Lstm in pytorch is [time_steps, batch_size, dim].

When I set the learning rate to 0.002, the indRNN method can converge with about 20 steps, in accordance with the figure in paper. I think this closely relates to the internal implement of different frameworks.

StefOe commented 6 years ago

Hej, good catch! I am currently rewriting the code to meet pytorch input standards. Expect a push soon!

StefOe commented 6 years ago

ok, commited. Does it work for you now?

My results on the add problem LSTM now converges after ~100 iterations: MSE after 10200 iterations: 0.15238375902175905 MSE after 10300 iterations: 0.15270531803369522 MSE after 10400 iterations: 0.1444737881422043 MSE after 10500 iterations: 0.1300613449513912 MSE after 10600 iterations: 0.12781122975051404 MSE after 10700 iterations: 0.10413295090198517 MSE after 10800 iterations: 0.09269013416022062 MSE after 10900 iterations: 0.07342484388500452 MSE after 11000 iterations: 0.058112387377768755 MSE after 11100 iterations: 0.04558405727148056 MSE after 11200 iterations: 0.03936733627691865 MSE after 11300 iterations: 0.03276601158082485 MSE after 11400 iterations: 0.027935269409790634

IndRNN still converges: MSE after 1500 iterations: 0.1389778371155262 MSE after 1600 iterations: 0.1304377694427967 MSE after 1700 iterations: 0.11719750754535198 MSE after 1800 iterations: 0.10569956064224244 MSE after 1900 iterations: 0.09345896508544684 MSE after 2000 iterations: 0.08435331284999847

StefOe / indrnn-pytorch

Can not converge with LSTM #3