Problem of reproducing the VQ-Diffusion-S results on CUB-200

Yikai-Wang commented 2 years ago

Hi there,

Thanks for your excellent work! I am trying to reproduce the results of VQ-Diffusion-S on CUB-200 with the provided configs. But the trained model cannot generate high-fidelity images and results in a FID score of more than 30.

I checked the code and dataset but do not locate the problem. Can you give me some suggestions to reproduce the results, like which hyper-parameters should I try to change?

Thanks a lot!

cientgu commented 2 years ago

First check the FID calculation. For the fair comparison with previous work, we calculate the FID by oversampling 30k real images, and compared with 30k generated images. Second, we evaluate the FID score every 30 epochs and select the best one. The default setting is 400 epochs, however, we achieve the best FID score at about 270 epochs. I am not sure about your model. Third, could you get reasonable results on VQ-Diffusion-B or VQ-Diffusion-F on CUB-200 ?

Yikai-Wang commented 2 years ago

I wonder how many gpus are you using for training VQ-Diffusion-S? I checked again in the details of hyper-parameters and find that the learning rate is correlated with the training iterations, which is dependent on the bs and n-gpus in training.

In my environment, I use 8 gpus such that the ReduceLROnPlateauWithWarmup lr scheduler will decrease the lr in about 90 epochs. Maybe this is the reason?

Besides, the training loss was stuck between 6.5 and 6.8 after training for 10k iterations (in 8 gpus). And the generated image is far away from realistic images. So the FID calculation is not the problem in my experiments. I have not run experiments on VQ-Diffusion-B or VQ-Diffusion-F.

cientgu commented 2 years ago

Yes, the lr is correlated with training iterations and bs. We use 8 GPUs to get the model. And the first time lr decrease at about 90 epochs makes sense to me. The training loss will decrease very slowly, it seems to be a general drawback of diffusion models for now. (Not only discrete models but also continuous models). I am not sure if there are mistakes in your experiments. Maybe you should check if the validation loss and FID decrease. If so, please consider longer training epochs by set max_epochs to a larger number.

Yikai-Wang commented 2 years ago

Thanks for the prompt reply. I am trying to train for more epochs.

Besides, did you use sync_bn, cudnn_deterministic, amp in your experiments? These are not included in your example running commands, but are realizable in your code. It will be helpful to reproduce your experiment under the same setting.

cientgu commented 2 years ago

No, we do not use them. Some of these settings are for further experiments or debugging.

Yikai-Wang commented 2 years ago

Thanks for the prompt reply. I can reproduce the results now.

XHMY commented 2 years ago

First check the FID calculation. For the fair comparison with previous work, we calculate the FID by oversampling 30k real images, and compared with 30k generated images. Second, we evaluate the FID score every 30 epochs and select the best one. The default setting is 400 epochs, however, we achieve the best FID score at about 270 epochs. I am not sure about your model. Third, could you get reasonable results on VQ-Diffusion-B or VQ-Diffusion-F on CUB-200 ?

Where could I get the code or script to evaluate the FID score every 30 epochs. I did not find the evaluation code in your repository. @cientgu

cientgu / VQ-Diffusion

Problem of reproducing the VQ-Diffusion-S results on CUB-200 #9