a-r-r-o-w / cogvideox-factory

Memory optimized finetuning scripts for CogVideoX using TorchAO and DeepSpeed
Apache License 2.0
177 stars 16 forks source link

torch._dynamo.exc.TorchRuntimeError :Cannot convert -oo to int #12

Closed glide-the closed 2 days ago

glide-the commented 3 days ago

After prepare_dataset, a type error occurred when using load_tensors for training

logs train_text_to_video_lora.sh

#!/bin/bash

export TORCH_LOGS="+dynamo,recompiles,graph_breaks"
export TORCHDYNAMO_VERBOSE=1
export WANDB_MODE="online"
export NCCL_P2P_DISABLE=1
export TORCH_NCCL_ENABLE_MONITORING=0
export WANDB_API_KEY=

GPU_IDS="4,5,6,7"

# Training Configurations
# Experiment with as many hyperparameters as you want!
LEARNING_RATES=("1e-4" "1e-3")
LR_SCHEDULES=("cosine_with_restarts")
OPTIMIZERS=("adamw" "adam")
MAX_TRAIN_STEPS=("3000")

# Single GPU uncompiled training
ACCELERATE_CONFIG_FILE="accelerate_configs/compiled_1.yaml"

# Absolute path to where the data is located. Make sure to have read the README for how to prepare data.
# This example assumes you downloaded an already prepared dataset from HF CLI as follows:
#   huggingface-cli download --repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset --local-dir /path/to/my/datasets/disney-dataset
DATA_ROOT="/mnt/ceph/develop/jiawei/lora_dataset/Dance-VideoGeneration-Dataset-encoded"
CAPTION_COLUMN="prompts.txt"
VIDEO_COLUMN="videos.txt"

# Launch experiments with different hyperparameters
for learning_rate in "${LEARNING_RATES[@]}"; do
  for lr_schedule in "${LR_SCHEDULES[@]}"; do
    for optimizer in "${OPTIMIZERS[@]}"; do
      for steps in "${MAX_TRAIN_STEPS[@]}"; do
        output_dir="/mnt/ceph/develop/jiawei/model_checkpoint/cogvideox-lora__optimizer_${optimizer}__steps_${steps}__lr-schedule_${lr_schedule}__learning-rate_${learning_rate}/"

        cmd="accelerate launch --config_file $ACCELERATE_CONFIG_FILE --gpu_ids $GPU_IDS training/cogvideox_text_to_video_lora.py \
          --pretrained_model_name_or_path /mnt/ceph/develop/jiawei/model_checkpoint/CogVideoX-2b-base \
          --load_tensors \
          --data_root $DATA_ROOT \
          --caption_column $CAPTION_COLUMN \
          --video_column $VIDEO_COLUMN \
          --height_buckets 960 \
          --width_buckets 720 \
          --frame_buckets 49 \
          --dataloader_num_workers 8 \
          --pin_memory \
          --id_token \"奶糖,\" \
          --validation_prompt \"奶糖, A young girl in a white blouse and navy skirt stands in a sunlit park, smiling and holding up two fingers. She's surrounded by trees and a pathway, with dappled sunlight casting shadows. A young woman in a school uniform stands on a tree-lined path, surprised, with hands raised. In the park, a woman in a white blouse with a navy collar raises her hands in a playful 'V' shape, surrounded by lush greenery and sunlight.:::奶糖, A young woman with long dark hair tied into ponytails stands in a cozy, warmly lit room, smiling gently at the camera. She takes a selfie, her hair styled in loose waves, with a playful expression. The background is a plain, light-colored wall, emphasizing her features.\" \
          --validation_prompt_separator ::: \
          --num_validation_videos 1 \
          --validation_epochs 10 \
          --seed 42 \
          --rank 128 \
          --lora_alpha 1 \
          --mixed_precision bf16 \
          --output_dir $output_dir \
          --max_num_frames 49 \
          --train_batch_size 1 \
          --max_train_steps $steps \
          --checkpointing_steps 1000 \
          --gradient_accumulation_steps 1 \
          --gradient_checkpointing \
          --learning_rate $learning_rate \
          --lr_scheduler $lr_schedule \
          --lr_warmup_steps 400 \
          --lr_num_cycles 1 \
          --enable_slicing \
          --enable_tiling \
          --optimizer $optimizer \
          --beta1 0.9 \
          --beta2 0.95 \
          --weight_decay 0.001 \
          --max_grad_norm 1.0 \
          --allow_tf32 \
          --report_to wandb \
          --tracker_name cogvideox-lora__optimizer_${optimizer}__steps_${steps}__lr-schedule_${lr_schedule}__learning-rate_${learning_rate} \
          --nccl_timeout 1800"

        echo "Running command: $cmd"
        eval $cmd
        echo -ne "-------------------- Finished executing script --------------------\n\n"
      done
    done
  done
done
glide-the commented 3 days ago

Is this the problem? load_tensors data processing when loading, but no preprocessing in advance and no such operation

image image

glide-the commented 3 days ago

Is this the problem? load_tensors data processing when loading, but no preprocessing in advance and no such operation

I tried without load_tensors, and the same problem occurred.

a-r-r-o-w commented 3 days ago

Could you post the error stack trace? I can't seem to reproduce and can do dataset preparation followed by training without any issues using the bash scripts we have in main branch

glide-the commented 3 days ago

The video resolution is w=720 , h=960. https://huggingface.co/datasets/Wild-Heart/Dance-VideoGeneration-Dataset

prepare_dataset

#!/bin/bash

MODEL_ID="/mnt/ceph/develop/jiawei/model_checkpoint/CogVideoX-2b-base"

# For more details on the expected data format, please refer to the README.
DATA_ROOT="/mnt/ceph/develop/jiawei/lora_dataset/Dance-VideoGeneration-Dataset"  # This needs to be the path to the base directory where your videos are located.
CAPTION_COLUMN="prompts.txt"
VIDEO_COLUMN="videos.txt"
OUTPUT_DIR="/mnt/ceph/develop/jiawei/lora_dataset/Dance-VideoGeneration-Dataset-encoded"
HEIGHT=480
WIDTH=720
MAX_NUM_FRAMES=49
MAX_SEQUENCE_LENGTH=226
TARGET_FPS=8
BATCH_SIZE=1
DTYPE=fp32

# To create a folder-style dataset structure without pre-encoding videos and captions'
CMD_WITHOUT_PRE_ENCODING="\
  python3 training/prepare_dataset.py \
    --model_id $MODEL_ID \
    --data_root $DATA_ROOT \
    --caption_column $CAPTION_COLUMN \
    --video_column $VIDEO_COLUMN \
    --output_dir $OUTPUT_DIR \
    --height $HEIGHT \
    --width $WIDTH \
    --max_num_frames $MAX_NUM_FRAMES \
    --max_sequence_length $MAX_SEQUENCE_LENGTH \
    --target_fps $TARGET_FPS \
    --batch_size $BATCH_SIZE \
    --dtype $DTYPE
"

CMD_WITH_PRE_ENCODING="$CMD_WITHOUT_PRE_ENCODING --save_tensors"

# Select which you'd like to run
CMD=$CMD_WITH_PRE_ENCODING

echo "===== Running \`$CMD\` ====="
eval $CMD
echo -ne "===== Finished running script =====\n"

Here are the detailed log information https://wandb.ai/dmeck/cogvideox-lora__optimizer_adam__steps_3000__lr-schedule_cosine_with_restarts__learning-rate_1e-3/runs/hmqiseyi/logs

glide-the commented 3 days ago
pip list
Package                  Version     Editable project location
------------------------ ----------- -------------------------------------------
accelerate               1.0.0
bitsandbytes             0.44.1
certifi                  2024.8.30
charset-normalizer       3.3.2
click                    8.1.7
decord                   0.6.0
diffusers                0.31.0.dev0 /mnt/ceph/develop/jiawei/diffusers_fork_zmf
docker-pycreds           0.4.0
filelock                 3.16.1
fsspec                   2024.9.0
gitdb                    4.0.11
GitPython                3.1.43
hf_transfer              0.1.8
huggingface-hub          0.25.1
idna                     3.10
imageio                  2.35.1
importlib_metadata       8.5.0
Jinja2                   3.1.4
MarkupSafe               3.0.1
mpmath                   1.3.0
networkx                 3.3
numpy                    2.1.1
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.6.77
nvidia-nvtx-cu12         12.1.105
packaging                24.1
pandas                   2.2.3
peft                     0.13.1
pillow                   10.4.0
pip                      24.2
platformdirs             4.3.6
protobuf                 5.28.2
psutil                   6.0.0
python-dateutil          2.9.0.post0
pytz                     2024.2
PyYAML                   6.0.2
regex                    2024.9.11
requests                 2.32.3
safetensors              0.4.5
sentencepiece            0.2.0
sentry-sdk               2.16.0
setproctitle             1.3.3
setuptools               75.1.0
six                      1.16.0
smmap                    5.0.1
sympy                    1.13.3
tokenizers               0.20.0
torch                    2.4.1
torchao                  0.5.0
torchvision              0.19.1
tqdm                     4.66.5
transformers             4.45.2
triton                   3.0.0
typing_extensions        4.12.2
tzdata                   2024.2
urllib3                  2.2.3
wandb                    0.18.3
wheel                    0.44.0
zipp                     3.20.2
glide-the commented 3 days ago

I tried to drop the video to 480*720 and found that it can be trained normally. I will submit a PR later.

a-r-r-o-w commented 3 days ago

Thanks for the PR. Happy to merge it soon, just going to take a look at the w=720 h=960 case to see what's wrong with the dance dataset

a-r-r-o-w commented 3 days ago

Is this the problem? load_tensors data processing when loading, but no preprocessing in advance and no such operation

image image

From the code screenshots, it looks like this is not the latest version of the code actually. We changed the .float() to .to(dtype=weight_dtype). Either way, I'll investigate the torch dynamo error

a-r-r-o-w commented 3 days ago

I tried a few things but was not able to replicate the error. The finetuning works fine with the dance dataset at 720x960 resolution too. Could you try pulling changes from the main branch and try again? Instead of .float(), we now have weight_dtype, so any TypeError's should not be happening. As for the torch._dynamo_exc.TorchRuntimeError, I'm not sure what caused it for you

glide-the commented 2 days ago

I checkout the main branch, and it didn't show up in this training.