fast-codi / CoDi

[CVPR24] CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
https://fast-codi.github.io/
78 stars 1 forks source link

Problems with training on Hugging Face datasets. #4

Open 00757039 opened 1 month ago

00757039 commented 1 month ago

export HF_HOME="/data/kmei1/huggingface/" export DISK_DIR="/data/kmei1/huggingface/cache" export MODEL_DIR="stabilityai/stable-diffusion-2-1" export OUTPUT_DIR="canny_model" export DATASET_NAME="jax-diffusers-event/canny_diffusiondb" export NCCL_P2P_DISABLE=1 export CUDA_VISIBLE_DEVICES=5

python3 train_codi_flax.py \ --pretrained_model_name_or_path $MODEL_DIR \ --output_dir $OUTPUT_DIR \ --dataset_name $DATASET_NAME \ --load_from_disk \ --cache_dir $DISK_DIR \ --resolution 512 \ --learning_rate 8e-6 \ --train_batch_size 2 \ --gradient_accumulation_steps 2 \ --revision main \ --from_pt \ --mixed_precision bf16 \ --max_train_steps 200_000 \ --checkpointing_steps 10_000 \ --validation_steps 100 \ --dataloader_num_workers 8 \ --distill_learning_steps 20 \ --ema_decay 0.99995 \ --onestepode uncontrol \ --onestepode_control_params target \ --onestepode_sample_eps vprediction \ --cfg_aware_distill \ --distill_loss consistency_x \ --distill_type conditional \ --image_column original_image \ --caption_column prompt \ --conditioning_image transformed_image \ --report_to wandb \ --validation_image "figs/control_bird_canny.png" \ --validation_prompt "birds" \

Hello! When I execute the training command mentioned above (and I have changed the HF_HOME and DISK_DIR to my path), I encounter a problem where the loss becomes NaN. Could you please help me understand the reason?

MKFMIKU commented 1 month ago

Could you please provide your loss curve visualization? The training should be stable, and it is rare to see Nan. @00757039