Open MorrisXu-Driving opened 3 years ago
This line may reverse the weight. when MAX - Attention, the positions with max attention weight becomes zero. I also did not find relevant information in the paper. Why add this line?
Did you figure out the issue?
I just deleted this line and do my own experiments with no anomaly spotted.
This line may reverse the weight. when MAX - Attention, the positions with max attention weight becomes zero. I also did not find relevant information in the paper. Why add this line?