Open yangfan293 opened 1 year ago
Hello, may I ask if the visualization in Figure 5 is directly output and drawn by attn_output_weights.sum(dim=1)/num_heads of depth cross-attention layer? Why is the picture drawn by my trained model very different from yours?
Hello, may I ask if the visualization in Figure 5 is directly output and drawn by attn_output_weights.sum(dim=1)/num_heads of depth cross-attention layer? Why is the picture drawn by my trained model very different from yours?