crossattention map - Githubissues

wj7486 commented 2 months ago

Hello author, can you tell me how the crossattn map in the paper was drawn? Which layer of cross attn calculation result is specific at time t=0? If you could reply to me, I would greatly appreciate it!

Rbrq03 commented 2 months ago

Hey @wj7486, thanks for your interest in our work. You can refer to https://github.com/Weifeng-Chen/prompt2prompt for the implementation of cross attention visualization. Please give a star to the author in honor of the great work. Besides, I am not quite sure about what you mean by mentioning time=0. The y-axis of this figure is the total training steps, it won't produce the given dog if you don't fine-tune the model. I hope I understand your point. If you have further questions, feel free to reach us!

wj7486 commented 2 months ago

Hey @wj7486, thanks for your interest in our work. You can refer to https://github.com/Weifeng-Chen/prompt2prompt for the implementation of cross attention visualization. Please give a star to the author in honor of the great work. Besides, I am not quite sure about what you mean by mentioning time=0. The y-axis of this figure is the total training steps, it won't produce the given dog if you don't fine-tune the model. I hope I understand your point. If you have further questions, feel free to reach us!

Thank you very much for your reply. The denoising sampling step ddim_step is usually 50 steps, and I understand that you should visualize it in the final step of image denoising, so the visualization of the crossover map is done when timestep=0, not at timestep=50 or intermediate steps. Thank you again for your reply. I will try it out.

Rbrq03 commented 2 months ago

Glad to hear that. Fell free to reopen this issue if you have further relevant questions about it.

Rbrq03 / ClassDiffusion

crossattention map #8