Hello, when I was building attention heatmaps, I found that the attention scores across different patches did not vary much. Have you encountered this problem before?
May I ask how your data preprocessing section is handled? The author did not provide specific steps, so I am not sure what method should be used for the data processing section
Hello, when I was building attention heatmaps, I found that the attention scores across different patches did not vary much. Have you encountered this problem before?