Introducing the Stable Video Diffusion Temporal Controlnet! This tool uses a controlnet style encoder with the svd base. It's designed to enhance your video diffusion projects by providing precise temporal control.
pip install -r requirements.txt
My example training config is configured like this:
accelerate launch train_svd.py \
--pretrained_model_name_or_path="stabilityai/stable-video-diffusion-img2vid" \
--output_dir="model_out" \
--csv_path="path-to-your-csv" \
--video_folder="path-to-your-videos" \
--depth_folder="path-to-your-depth" \
--motion_folder="path-to-your-motion" \
--validation_image_folder="./validation_demo/rgb" \
--validation_control_folder="./validation_demo/depth" \
--width=512 \
--height=512 \
--learning_rate=2e-5 \
--per_gpu_batch_size=8 \
--num_train_epochs=5 \
--mixed_precision="fp16" \
--gradient_accumulation_steps=2 \
--checkpointing_steps=2000 \
--validation_steps=400 \
--gradient_checkpointing