Closed Time-Lord12th closed 4 months ago
M is for reading 'Multiple' times. It was once used for an ablation study when the input image to the reference branch is not noised to the same level as the latents, in which case the KV matrices can be reused for multiple steps in inference. The M mode is not used for the current Zero123++ architecture.
Hello, I'm studying the![image](https://github.com/SUDO-AI-3D/zero123plus/assets/53140195/da049a7c-8ec9-4d87-bf70-f5c34eef6f93)
pipeline.py
. TheReferenceOnlyAttnProc
is the implementation of “appending the self-attention K and V”, right? I wonder what ismode == 'm'
for, since I foundmode == 'w'
is for storing encoder_hidden_states, andmode == 'r'
is for appending. I suspect this is to ensure the completeness of the computation graph for the backward propagation.