Open cwchenwang opened 1 year ago
It is possible to apply the loss on the decoded image (and not pass the gradients through the vae encoder). However, in my experience the results weren't as good, and it's not faithful to the description in the original Dreamfusion paper. I think this alternative is mentioned in the Score Jacobian Chaining paper.
Hi, I wonder if you have figured this out. I don't get why the encoder need to be updated no matter where the loss is conducted upon. Image space loss would probably resulted in a blurry/over-saturated outcome as there is no constraint on the consistency between the generated images.
I think the encoding part is part of the diffusion model and doesn't need to train. But why you are training here?