Open JWargrave opened 2 days ago
We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds.
We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds.
Does the diffusers version refer to train_cogvideox_image_to_video_lora.py? I ran it according to the instructions here, but encountered the following error:
[rank7]: Traceback (most recent call last):
[rank7]: File "/suqinzs/jwargrave/CogVideo-305/sat/train_cogvideox_image_to_video_lora.py", line 1620, in <module>
[rank7]: main(args)
[rank7]: File "/suqinzs/jwargrave/CogVideo-305/sat/train_cogvideox_image_to_video_lora.py", line 1428, in main
[rank7]: model_output = transformer(
[rank7]: ^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
[rank7]: else self._run_ddp_forward(*inputs, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
[rank7]: return self.module(*inputs, **kwargs) # type: ignore[index]
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/accelerate/utils/operations.py", line 823, in forward
[rank7]: return model_forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/accelerate/utils/operations.py", line 811, in __call__
[rank7]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[rank7]: return func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 470, in forward
[rank7]: ofs_emb = self.ofs_proj(ofs)
[rank7]: ^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/embeddings.py", line 928, in forward
[rank7]: t_emb = get_timestep_embedding(
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/embeddings.py", line 54, in get_timestep_embedding
[rank7]: assert len(timesteps.shape) == 1, "Timesteps should be a 1d-array"
[rank7]: ^^^^^^^^^^^^^^^
[rank7]: AttributeError: 'NoneType' object has no attribute 'shape'
The training script is as follows (Note that the pretrained_model_name_or_path
is CogVideoX1.5-5B-I2V):
#!/bin/bash
clear
GPU_IDS="0,1,2,3,4,5,6,7"
accelerate launch --gpu_ids $GPU_IDS train_cogvideox_image_to_video_lora.py \
--pretrained_model_name_or_path ./pretrained_weights/CogVideoX1.5-5B-I2V \
--cache_dir ./cache \
--instance_data_root ./storyboard_data_for_cog \
--caption_colum prompts_cogvlm2_debug.txt \
--video_column videos_debug.txt \
--validation_prompt "A woman is sitting in a basket under a tree. She is holding a pink flower and looking at it. The basket is made of straw and the woman is wearing a white dress. There are green leaves on the ground around her." \
--validation_images "a.jpg" \
--num_validation_videos 1 \
--validation_epochs 10 \
--seed 42 \
--rank 64 \
--lora_alpha 64 \
--mixed_precision fp16 \
--output_dir ./output-cogvideox-lora \
--height 480 --width 720 --fps 8 --max_num_frames 49 --skip_frames_start 0 --skip_frames_end 0 \
--train_batch_size 1 \
--num_train_epochs 30 \
--checkpointing_steps 1000 \
--gradient_accumulation_steps 1 \
--learning_rate 1e-3 \
--lr_scheduler cosine_with_restarts \
--lr_warmup_steps 200 \
--lr_num_cycles 1 \
--enable_slicing \
--enable_tiling \
--optimizer Adam \
--adam_beta1 0.9 \
--adam_beta2 0.95 \
--max_grad_norm 1.0
Need to check it out @zhipuch
We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds.
你好,那目前的SAT版本显存76G,是因为这个是全部训练,diffusers是lora ft是?
Can I modify the
examples/cogvideo/train_cogvideox_image_to_video_lora.py
in diffusers into the fully fine-tuned one? Since the training scriptsat/finetune_multi_gpus.sh
has some bug.