Closed bufuchangfeng closed 9 months ago
I'm exactly with the same problem. I wrote my own model based on the provided code of the 3rd/4th tutorial (i.e. add attention) and I'm using my own data (specifically, I'm using Tatoeba es-en from Opus). The results when checking the attention are the same that you provided. It looks like if the attention is "looking" at the last tokens. I've checked out the code ~30 times, but I don't see any problem after have solved a few mistakes...
@bufuchangfeng did you solve this problem?
I already know what my problem was. The problem was related to the quantity of data that I was providing. It seems that if you provide too small data, the attention will not generalize well, and it seems to focus on the last tokens for some reason I don't understand. Now, with 100.000 sentences (+80.000 very similar sentences to the initial 100.000, which is a total of ~180.000 sentences), 60/20/20 for train/dev/test, the attention seems to focus on the expected tokens.
I just leave this comment here in the case that might help someone :)
I write an attention(based on lstm) for nmt using own data by studying your code.but the attention weights doesn't seem normal. here is the code. Can i get some help from you?thanks very much.
code