Seanny123 / da-rnn

Dual-Stage Attention-Based Recurrent Neural Net for Time Series Prediction
333 stars 118 forks source link

why not use tanh in encoder while use it in decoder ? #11

Open chuanchuan12138 opened 5 years ago

chuanchuan12138 commented 5 years ago

Firstly, thanks for your code ,it really helps me a lot to understander the paper. But when i debug the code , i find that in modules.py seanny used tanh in decoder while omit it in encoder ,but in paper ,formula 8 and 12 both use tanh to calculate part of attention weight. I dont know why , can anybody offer some help?Thanks in advance !

ljtruong commented 4 years ago

Here's my experimentation with and without tanh in the encoder. Note, I've ensured I've set my model to eval + no_grad before predicting and no_grad during validation. which is different in this repo and I believe it should have been implemented.

without tanh in encoder without tanh

with tanh in encoder with tanh

In addition, during training the validation loss will reduce faster with tanh. 10 epochs with tanh without tanh with tanh in encoder with tanh

Note: I've trained, validated and predicted over the whole dataset for testing purposes. My assumption was I should get near 99%+ accuracy if the underlying equations are working properly.

chuanchuan12138 commented 4 years ago

Hi worulz, thanks for your careful experiment, it really clears up my confusion. As for your no_grad operantion, I think main.py doesn't consider to have a validation or predict operation, it just train the model , while in the predict function , in my opinion, it just aims to show the loss of that train epoch, you may consider it a train process. I don't know if it's correct or not, but I think the no_grad function is used in validation or test process, so it's necessary if you want to evaluate the model, but not this place, maybe another function. Thank you again for your clear pics for comparison.