FoundationVision / OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
https://www.wangjunke.info/OmniTokenizer/
MIT License
263 stars 7 forks source link

NaN value in loss #8

Open wusize opened 4 months ago

wusize commented 4 months ago

Hi, thanks for your great work! I am trying to reproduce vqgan on imagenet by running this script (stage 1). However, the training processes always collapsed between 3k iters and 6k iters with NaN in losses. Is there any trick to avoid NaN during training?

hyc9 commented 4 months ago

I have encountered this before and found that reducing the number of warm-up steps can be solved

wusize commented 4 months ago

Thanks for the feedback! I have an additional question on why the warm-up steps of the discriminator are 500000 (--dis_warmup_steps 500000), i.e., the discriminator loss is increased linearly across the whole training process.

wdrink commented 4 months ago

Could you share more details, e.g., what type of data did you use? Thanks