SUDO-AI-3D / zero123plus

Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Apache License 2.0
1.56k stars 108 forks source link

Question about ReferenceOnlyAttnProc #63

Closed Time-Lord12th closed 4 months ago

Time-Lord12th commented 5 months ago

Hello, I'm studying the pipeline.py. The ReferenceOnlyAttnProc is the implementation of “appending the self-attention K and V”, right? I wonder what is mode == 'm' for, since I found mode == 'w' is for storing encoder_hidden_states, and mode == 'r' is for appending. I suspect this is to ensure the completeness of the computation graph for the backward propagation. image

eliphatfs commented 5 months ago

M is for reading 'Multiple' times. It was once used for an ablation study when the input image to the reference branch is not noised to the same level as the latents, in which case the KV matrices can be reused for multiple steps in inference. The M mode is not used for the current Zero123++ architecture.