Open kleinzcy opened 9 months ago
The training is relatively stable and consistent for each single experiment. It is not as varied as your provided results. May you provide us your training hyper-params and the detail of model checkpoint you used for evaluation like checkpoint at which epoch? One practice is to enable --use_ema
args when training to mitigate the large oscillation of model performance.
Thanks for your reply. The script I use is as follows:
accelerate launch --main_process_port 33996 --num_processes 2 train_flow_latent.py --exp celeb_f8_dit_g2 \
--dataset celeba_256 --datadir celeba_hq/celeba-lmdb \
--batch_size 32 --num_epoch 500 \
--image_size 256 --f 8 --num_in_channels 4 --num_out_channels 4 \
--nf 256 --ch_mult 1 2 3 4 --attn_resolution 16 8 4 --num_res_blocks 2 \
--lr 2e-4 --scale_factor 0.18215 --no_lr_decay \
--model_type DiT-L/2 --num_classes 1 --label_dropout 0. \
--save_content --save_content_every 10
And I use the checkpoint of 474 and 500 epochs for evaluation. I will try to use --use_ema
.
Hi, authors:
Thanks for your work and code. I have tried to run your code on 2 A100. But the result is ~7, which seems hard to achieve 5.26 on Celeba 256x256. Therefore, I am curious about the stability of training. Do the results vary a lot for several runs?