Closed a-r-r-o-w closed 4 days ago
DeepSpeed errors out with:
Details unfold to something blank.
Oh sorry, really weird! Updated
Thanks. How can I reproduce the error?
Thanks. How can I reproduce the error?
I think you just have to change the config in the train_text_to_video_lora.sh file to use the DeepSpeed one.
This is what I'm using for example (from the root folder of the repo):
export TORCH_LOGS="+dynamo,recompiles,graph_breaks"
export TORCHDYNAMO_VERBOSE=1
export WANDB_MODE="offline"
export NCCL_P2P_DISABLE=1
export TORCH_NCCL_ENABLE_MONITORING=0
GPU_IDS="2,3"
DATA_ROOT="training/dump"
CAPTION_COLUMN="prompts.txt"
VIDEO_COLUMN="videos.txt"
cmd="accelerate launch --config_file accelerate_configs/deepspeed.yaml --gpu_ids $GPU_IDS training/cogvideox_text_to_video_lora.py \
--pretrained_model_name_or_path THUDM/CogVideoX-5b \
--data_root $DATA_ROOT \
--caption_column $CAPTION_COLUMN \
--video_column $VIDEO_COLUMN \
--id_token BW_STYLE \
--height_buckets 480 \
--width_buckets 720 \
--frame_buckets 49 \
--load_tensors \
--validation_prompt \"BW_STYLE A black and white animated scene unfolds with an anthropomorphic goat surrounded by musical notes and symbols, suggesting a playful environment. Mickey Mouse appears, leaning forward in curiosity as the goat remains still. The goat then engages with Mickey, who bends down to converse or react. The dynamics shift as Mickey grabs the goat, potentially in surprise or playfulness, amidst a minimalistic background. The scene captures the evolving relationship between the two characters in a whimsical, animated setting, emphasizing their interactions and emotions\" \
--validation_prompt_separator ::: \
--num_validation_videos 1 \
--validation_epochs 1 \
--seed 42 \
--rank 64 \
--lora_alpha 64 \
--mixed_precision bf16 \
--output_dir /raid/aryan/cogvideox-lora \
--max_num_frames 49 \
--train_batch_size 1 \
--max_train_steps 3000 \
--checkpointing_steps 1000 \
--gradient_accumulation_steps 1 \
--gradient_checkpointing \
--learning_rate 0.0001 \
--lr_scheduler constant \
--lr_warmup_steps 200 \
--lr_num_cycles 1 \
--enable_slicing \
--enable_tiling \
--optimizer adamw \
--beta1 0.9 \
--beta2 0.95 \
--beta3 0.99 \
--weight_decay 0.001 \
--max_grad_norm 1.0 \
--allow_tf32 \
--report_to wandb \
--nccl_timeout 1800"
echo "Running command: $cmd"
eval $cmd
echo -ne "-------------------- Finished executing script --------------------\n\n"
@a-r-r-o-w DeepSpeed seems to be working.
DeepSpeed errors out with: (cc @sayakpaul)
DDP + uncompiled: works
DDP + compiled: does not work. I don't think this setting has ever worked for me, or that they are compatible with each other (seems like so from some quick googling)