Open ArlenCHEN opened 3 months ago
I had trained lora of 768x768 model with 144 frames on A100, using bf16. And the speed is ~10s/it, FYI.
thanks for the info!
ya. that's something I expect to see. but sadly. anyway I will check and update here on anything I will get.
I met the same problem too. I fine-tune the model with lora on V100 machines. Its speed is about 40s/it. When I don't use the lora, the speed is 26s/it.
thanks for the info!
ya. that's something I expect to see. but sadly. anyway I will check and update here on anything I will get.
do you find any solution to speed up the lora-training?
@gulucaptain Not yet. Did you use --enable_xformers_memory_efficient_attention
?
My situation: 8 A100 80G, batchsize 1, 19.35s/it, I feel it very slow. Is that normal? I use the default train.sh as accelerate launch --mixed_precision='bf16' scripts/train.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --train_data_dir=$DATASET_NAME \ --train_data_meta=$DATASET_META_NAME \ --config_path "config/easyanimate_video_slicevae_multi_text_encoder_v4.yaml" \ --image_sample_size=512 \ --video_sample_size=512 \ --token_sample_size=512 \ --video_sample_stride=1 \ --video_sample_n_frames=144 \ --train_batch_size=1 \ --video_repeat=1 \ --gradient_accumulation_steps=1 \ --dataloader_num_workers=8 \ --num_train_epochs=100 \ --checkpointing_steps=500 \ --learning_rate=2e-05 \ --lr_scheduler="constant_with_warmup" \ --lr_warmup_steps=100 \ --seed=42 \ --output_dir="output_dir/ft_0.1Mv" \ --enable_xformers_memory_efficient_attention \ --gradient_checkpointing \ --mixed_precision="bf16" \ --adam_weight_decay=3e-2 \ --adam_epsilon=1e-10 \ --vae_mini_batch=1 \ --max_grad_norm=0.05 \ --random_hw_adapt \ --training_with_video_token_length \ --motion_sub_loss \ --not_sigma_loss \ --random_frame_crop \ --enable_bucket \ --train_mode="inpaint" \ --trainable_modules "."
help me and help each other, please
Dear author @yunkchen
Thanks for your awesome work!
I tried to run the lora training using my data, but the speed is very slow --- ~40s/it.
Training details: 512x512 model 2 GPUs - batch size: 1 for each
Is there anything I missed? Please give some hints on this. Thanks!