Closed glide-the closed 2 days ago
Is this the problem? load_tensors data processing when loading, but no preprocessing in advance and no such operation
Is this the problem? load_tensors data processing when loading, but no preprocessing in advance and no such operation
I tried without load_tensors, and the same problem occurred.
Could you post the error stack trace? I can't seem to reproduce and can do dataset preparation followed by training without any issues using the bash scripts we have in main branch
The video resolution is w=720 , h=960. https://huggingface.co/datasets/Wild-Heart/Dance-VideoGeneration-Dataset
prepare_dataset
#!/bin/bash
MODEL_ID="/mnt/ceph/develop/jiawei/model_checkpoint/CogVideoX-2b-base"
# For more details on the expected data format, please refer to the README.
DATA_ROOT="/mnt/ceph/develop/jiawei/lora_dataset/Dance-VideoGeneration-Dataset" # This needs to be the path to the base directory where your videos are located.
CAPTION_COLUMN="prompts.txt"
VIDEO_COLUMN="videos.txt"
OUTPUT_DIR="/mnt/ceph/develop/jiawei/lora_dataset/Dance-VideoGeneration-Dataset-encoded"
HEIGHT=480
WIDTH=720
MAX_NUM_FRAMES=49
MAX_SEQUENCE_LENGTH=226
TARGET_FPS=8
BATCH_SIZE=1
DTYPE=fp32
# To create a folder-style dataset structure without pre-encoding videos and captions'
CMD_WITHOUT_PRE_ENCODING="\
python3 training/prepare_dataset.py \
--model_id $MODEL_ID \
--data_root $DATA_ROOT \
--caption_column $CAPTION_COLUMN \
--video_column $VIDEO_COLUMN \
--output_dir $OUTPUT_DIR \
--height $HEIGHT \
--width $WIDTH \
--max_num_frames $MAX_NUM_FRAMES \
--max_sequence_length $MAX_SEQUENCE_LENGTH \
--target_fps $TARGET_FPS \
--batch_size $BATCH_SIZE \
--dtype $DTYPE
"
CMD_WITH_PRE_ENCODING="$CMD_WITHOUT_PRE_ENCODING --save_tensors"
# Select which you'd like to run
CMD=$CMD_WITH_PRE_ENCODING
echo "===== Running \`$CMD\` ====="
eval $CMD
echo -ne "===== Finished running script =====\n"
Here are the detailed log information https://wandb.ai/dmeck/cogvideox-lora__optimizer_adam__steps_3000__lr-schedule_cosine_with_restarts__learning-rate_1e-3/runs/hmqiseyi/logs
pip list
Package Version Editable project location
------------------------ ----------- -------------------------------------------
accelerate 1.0.0
bitsandbytes 0.44.1
certifi 2024.8.30
charset-normalizer 3.3.2
click 8.1.7
decord 0.6.0
diffusers 0.31.0.dev0 /mnt/ceph/develop/jiawei/diffusers_fork_zmf
docker-pycreds 0.4.0
filelock 3.16.1
fsspec 2024.9.0
gitdb 4.0.11
GitPython 3.1.43
hf_transfer 0.1.8
huggingface-hub 0.25.1
idna 3.10
imageio 2.35.1
importlib_metadata 8.5.0
Jinja2 3.1.4
MarkupSafe 3.0.1
mpmath 1.3.0
networkx 3.3
numpy 2.1.1
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.6.77
nvidia-nvtx-cu12 12.1.105
packaging 24.1
pandas 2.2.3
peft 0.13.1
pillow 10.4.0
pip 24.2
platformdirs 4.3.6
protobuf 5.28.2
psutil 6.0.0
python-dateutil 2.9.0.post0
pytz 2024.2
PyYAML 6.0.2
regex 2024.9.11
requests 2.32.3
safetensors 0.4.5
sentencepiece 0.2.0
sentry-sdk 2.16.0
setproctitle 1.3.3
setuptools 75.1.0
six 1.16.0
smmap 5.0.1
sympy 1.13.3
tokenizers 0.20.0
torch 2.4.1
torchao 0.5.0
torchvision 0.19.1
tqdm 4.66.5
transformers 4.45.2
triton 3.0.0
typing_extensions 4.12.2
tzdata 2024.2
urllib3 2.2.3
wandb 0.18.3
wheel 0.44.0
zipp 3.20.2
I tried to drop the video to 480*720 and found that it can be trained normally. I will submit a PR later.
Thanks for the PR. Happy to merge it soon, just going to take a look at the w=720 h=960 case to see what's wrong with the dance dataset
Is this the problem? load_tensors data processing when loading, but no preprocessing in advance and no such operation
From the code screenshots, it looks like this is not the latest version of the code actually. We changed the .float()
to .to(dtype=weight_dtype)
. Either way, I'll investigate the torch dynamo error
I tried a few things but was not able to replicate the error. The finetuning works fine with the dance dataset at 720x960 resolution too. Could you try pulling changes from the main
branch and try again? Instead of .float()
, we now have weight_dtype
, so any TypeError's should not be happening. As for the torch._dynamo_exc.TorchRuntimeError, I'm not sure what caused it for you
I checkout the main branch, and it didn't show up in this training.
After prepare_dataset, a type error occurred when using load_tensors for training
logs train_text_to_video_lora.sh