Open jy0205 opened 2 weeks ago
In Stage 1, we trained on the non-accelerated model (sdxl-base) using traditional diffusion loss. For Stage 2 and 3, although in the first implementation all losses were calculated on the lightning branch, we believe that it is also okay to place the diffusion loss on the non-accelerated model, and it will also help with compatibility.
Thanks for your quick responses!
By the way? How many timestpes do you used for flux-dev PuLID training (when calculating the ID loss) ?
Hi! Thanks for your extraordinary work! I have a question about the training. The paper mentioned that "We introduce a Lightning T2I branch alongside the regular diffusion branch." But in the method section, all the loss calculation are conducted in the lighting T2I branch. I wonder to know, in the training, is it enough to only need a lighting model (SDXL Lighting) instead of using the original base model (SDXL)? I mean the original base model does not need any trainable parameters. In this way, where does the "alongside" reflect?