Closed WillowKaze closed 6 months ago
And another thing that confuses me is that "scaled by a factor of 5" in the paper, is that implemented in the code? I could not find that.
By default the SD VAE output needs to be rescaled by about 0.18 (vae.config.scaling_factor
) before sending into diffusion; we skipped that step for the condition branch (so it is roughly scaled by a factor of 5), and we have an extra function called scale_latents
that normalizes the residual by shifting and rescaling the latents according to statistics we compute with Objaverse renders.
Excellent work! I was ispired by it and want to try it with some other work. I understand the you for not release the training code, but I will appreciate if you could tell more about the details.
Thank you!