a-r-r-o-w / cogvideox-factory

Memory optimized finetuning scripts for CogVideoX using TorchAO and DeepSpeed
Apache License 2.0
393 stars 34 forks source link

How to set the hyperparameters when finetuning I2V model with LoRA? #68

Open TousakaNagio opened 3 weeks ago

TousakaNagio commented 3 weeks ago

File "/home/shinji106/ntu/cogvideox-factory/training/dataset.py", line 411, in iter
self.buckets[(f, h, w)].append(data)
KeyError: (16, 320, 720)

The resolution is (13, 320, 480) so the key of self.bucket does not match with input. How do I set the hyperparameters when running the prepare_dataset.sh and train_image_to_video_lora.sh so that the key will match?

dvschultz commented 3 days ago

Were you able to solve this? I'm having a similar issue on the train_text_to_video_lora.sh using the sample Disney video dataset.

Running command: accelerate launch --config_file accelerate_configs/uncompiled_1.yaml --gpu_ids 0 training/cogvideox_text_to_video_lora.py           --pretrained_model_name_or_path THUDM/CogVideoX-5b           --data_root /content/drive/MyDrive/cogvideox-factory/video-dataset-disney-processed           --caption_column prompts.txt           --video_column videos.txt           --id_token BW_STYLE           --height_buckets 480           --width_buckets 720           --frame_buckets 49           --dataloader_num_workers 8           --pin_memory           --validation_prompt "BW_STYLE A black and white animated scene unfolds with an anthropomorphic goat surrounded by musical notes and symbols, suggesting a playful environment. Mickey Mouse appears, leaning forward in curiosity as the goat remains still. The goat then engages with Mickey, who bends down to converse or react. The dynamics shift as Mickey grabs the goat, potentially in surprise or playfulness, amidst a minimalistic background. The scene captures the evolving relationship between the two characters in a whimsical, animated setting, emphasizing their interactions and emotions:::BW_STYLE A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance"           --validation_prompt_separator :::           --num_validation_videos 1           --validation_epochs 10           --seed 42           --rank 128           --lora_alpha 128           --mixed_precision bf16           --output_dir ./cogvideox-lora__optimizer_adamw__steps_3000__lr-schedule_cosine_with_restarts__learning-rate_1e-4/           --max_num_frames 49           --train_batch_size 1           --max_train_steps 3000           --checkpointing_steps 1000           --gradient_accumulation_steps 1           --gradient_checkpointing           --learning_rate 1e-4           --lr_scheduler cosine_with_restarts           --lr_warmup_steps 400           --lr_num_cycles 1           --enable_slicing           --enable_tiling           --enable_model_cpu_offload           --load_tensors           --optimizer adamw           --beta1 0.9           --beta2 0.95           --weight_decay 0.001           --max_grad_norm 1.0           --allow_tf32           --report_to wandb           --nccl_timeout 1800
2024-11-11 22:09:24.906180: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-11 22:09:24.924515: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-11 22:09:24.945701: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-11 22:09:24.952273: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-11 22:09:24.967960: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-11 22:09:26.182734: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Downloading shards: 100% 2/2 [00:00<00:00, 6786.90it/s]
Loading checkpoint shards: 100% 2/2 [00:10<00:00,  5.43s/it]
Fetching 2 files: 100% 2/2 [00:00<00:00, 33288.13it/s]
{'use_learned_positional_embeddings'} was not found in config. Values will be initialized to default values.
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Tracking run with wandb version 0.18.5
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
===== Memory before training =====
memory_allocated=19.906 GB
max_memory_allocated=19.906 GB
max_memory_reserved=20.006 GB
***** Running training *****
  Num trainable parameters = 132120576
  Num examples = 69
  Num batches each epoch = 69
  Num epochs = 44
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient accumulation steps = 1
  Total optimization steps = 3000
Steps:   0% 0/3000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/content/drive/MyDrive/cogvideox-factory/training/cogvideox_text_to_video_lora.py", line 946, in <module>
    main(args)
  File "/content/drive/MyDrive/cogvideox-factory/training/cogvideox_text_to_video_lora.py", line 645, in main
    for step, batch in enumerate(train_dataloader):
  File "/usr/local/lib/python3.10/dist-packages/accelerate/data_loader.py", line 547, in __iter__
    dataloader_iter = self.base_dataloader.__iter__()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 440, in __iter__
    return self._get_iterator()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1083, in __init__
    self._reset(loader, first_iter=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1116, in _reset
    self._try_put_index()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1350, in _try_put_index
    index = self._next_index()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 620, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 288, in __iter__
    for idx in self.sampler:
  File "/content/drive/MyDrive/cogvideox-factory/training/dataset.py", line 410, in __iter__
    self.buckets[(f, h, w)].append(data)
KeyError: (49, 384, 576)

Those values seems to be correct per the dataset specs, but it still throws an error.