bentrevett / pytorch-seq2seq

Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
MIT License
5.37k stars 1.34k forks source link

can i get some help?thanks! #144

Closed bufuchangfeng closed 9 months ago

bufuchangfeng commented 3 years ago

I write an attention(based on lstm) for nmt using own data by studying your code.but the attention weights doesn't seem normal. here is the code. Can i get some help from you?thanks very much. code

cgr71ii commented 2 years ago

I'm exactly with the same problem. I wrote my own model based on the provided code of the 3rd/4th tutorial (i.e. add attention) and I'm using my own data (specifically, I'm using Tatoeba es-en from Opus). The results when checking the attention are the same that you provided. It looks like if the attention is "looking" at the last tokens. I've checked out the code ~30 times, but I don't see any problem after have solved a few mistakes...

@bufuchangfeng did you solve this problem?

cgr71ii commented 2 years ago

Captura de pantalla de 2022-01-01 02-25-58

I already know what my problem was. The problem was related to the quantity of data that I was providing. It seems that if you provide too small data, the attention will not generalize well, and it seems to focus on the last tokens for some reason I don't understand. Now, with 100.000 sentences (+80.000 very similar sentences to the initial 100.000, which is a total of ~180.000 sentences), 60/20/20 for train/dev/test, the attention seems to focus on the expected tokens.

I just leave this comment here in the case that might help someone :)