Open lbwang2006 opened 2 months ago
Do you enable the gan loss? We also meet it, it will happen after ~30-50k steps. But it does not matter, just resume it.
Do you enable the gan loss? We also meet it, it will happen after ~30-50k steps. But it does not matter, just resume it.
yes, I enable the gan loss, and the loss is nan, and does not get better. only restart training script with the model with the latest good checkpoint?
and is gan loss necessary if it is easy to lead nan loss?
and is gan loss necessary if it is easy to lead nan loss?
The GAN loss plays a crucial role in preserving high-frequency information and should not be omitted.
In v1.0.0 we didn't use gan loss. In v1.1.0 vae's capabilities will be vastly improved.
In v1.0.0 we didn't use gan loss. In v1.1.0 vae's capabilities will be vastly improved.
I found the config in the current causalvae, loss type is opensora.models.ae.videobase.losses.LPIPSWithDiscriminator, I think gan loss has already been used?
In v1.0.0 we didn't use gan loss. In v1.1.0 vae's capabilities will be vastly improved.
I found the config in the current causalvae, loss type is
opensora.models.ae.videobase.losses.LPIPSWithDiscriminator, I think gan loss has already been used?
Sorry for that. Due to a previous code refactoring, the config.json file was added after the training of the released causalvae. It is sure that the release model was trained without the use of a GAN.
Thanks for the great project. I wonder when will you release the new version of training code?
This month.
Thanks for the great project. I wonder when will you release the new version of training code?
The nll_grads is easy to exceed the maximum precision that bf16 can represent, it is recommended not to use amp training and use float32 training.
In v1.0.0 we didn't use gan loss. In v1.1.0 vae's capabilities will be vastly improved.
but I found loss.discrimator in the v1.1.0 vae weight....
vae bf16 training loss nan, pytorch_lighting, how to solve this