Open wusize opened 4 months ago
I have encountered this before and found that reducing the number of warm-up steps can be solved
Thanks for the feedback! I have an additional question on why the warm-up steps of the discriminator are 500000 (--dis_warmup_steps 500000
), i.e., the discriminator loss is increased linearly across the whole training process.
Could you share more details, e.g., what type of data did you use? Thanks
Hi, thanks for your great work! I am trying to reproduce vqgan on imagenet by running this script (stage 1). However, the training processes always collapsed between 3k iters and 6k iters with NaN in losses. Is there any trick to avoid NaN during training?