YBYBZhang / ControlVideo

[ICLR 2024] Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"
MIT License
742 stars 56 forks source link

problem with triton #15

Open gonduras opened 1 year ago

gonduras commented 1 year ago

(venv) E:\ControlVideo>python inference.py A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton.language' Traceback (most recent call last): File "E:\ControlVideo\inference.py", line 20, in from models.pipeline_controlvideo import ControlVideoPipeline File "E:\ControlVideo\models\pipeline_controlvideo.py", line 28, in from .controlnet import ControlNetOutput File "E:\ControlVideo\models\controlnet.py", line 27, in from .controlnet_unet_blocks import ( File "E:\ControlVideo\models\controlnet_unet_blocks.py", line 6, in from .controlnet_attention import Transformer3DModel File "E:\ControlVideo\models\controlnet_attention.py", line 15, in from diffusers.models.attention import CrossAttention, FeedForward, AdaLayerNorm ImportError: cannot import name 'CrossAttention' from 'diffusers.models.attention' (E:\ControlVideo\venv\lib\site-packages\diffusers\models\attention.py)

JPW0080 commented 1 year ago

pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl

gonduras commented 1 year ago

thank you! can you help me with the setup? now i get this:

(venv) E:\control-a-video>python inference.py --prompt "gondurastration" --input_video bear.mp4 --control_mode depth A matching Triton is not available, some optimizations will not be enabled. Error caught was: DLL load failed while importing libtriton: The specified module could not be found. controlnet\diffusion_pytorch_model.safetensors not found

i put in the directory controlnet files from here https://huggingface.co/lllyasviel/sd-controlnet-depth/blob/main/diffusion_pytorch_model.safetensors

JPW0080 commented 1 year ago

git pull for latest update pip install -r requirements.txt

here is my folder structure,

checkpoints folder contains:

models--lllyasviel--ControlNet sd-controlnet-canny sd-controlnet-depth sd-controlnet-openpose stable-diffusion-v1-5 flownet.pkl

And here is a command that was used for 1.0 ControlNet,

python inference.py --prompt "Solid platinum, Statue of Liberty, moonwalk" --condition depth_midas --video_path "data/moonwalk.mp4" --output_path "outputs" --video_length 15 --smoother_steps 19 20 --width 512 --height 512 --frame_rate 2 --version v10

Lastly, https://github.com/YBYBZhang/ControlVideo/blob/master/inference.py

Take a look at the def get_args(): Section

gonduras commented 1 year ago

thanks @JPW0080 still struggling

(venv) E:\ControlVideo>python inference.py --prompt "gondurastration" --condition depth_midas --video_path "data/moonwalk.mp4" --output_path "outputs" --video_length 15 --smoother_steps 19 20 --width 512 --height 512 --frame_rate 2 --version v10 Downloading (…)id-midas-501f0c75.pt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 493M/493M [07:07<00:00, 1.15MB/s] E:\ControlVideo\venv\lib\site-packages\timm\models_factory.py:114: UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k. model = create_fn( Traceback (most recent call last): File "E:\ControlVideo\inference.py", line 98, in vae = AutoencoderKL.from_pretrained(sd_path, subfolder="vae").to(dtype=torch.float16) File "E:\ControlVideo\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 558, in from_pretrained model = cls.from_config(config, unused_kwargs) File "E:\ControlVideo\venv\lib\site-packages\diffusers\configuration_utils.py", line 210, in from_config model = cls(init_dict) File "E:\ControlVideo\venv\lib\site-packages\diffusers\configuration_utils.py", line 569, in inner_init init(self, *args, **init_kwargs) File "E:\ControlVideo\venv\lib\site-packages\diffusers\models\autoencoder_kl.py", line 86, in init self.encoder = Encoder( File "E:\ControlVideo\venv\lib\site-packages\diffusers\models\vae.py", line 65, in init down_block = get_down_block( File "E:\ControlVideo\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 88, in get_down_block raise ValueError("cross_attention_dim must be specified for CrossAttnDownBlock2D") ValueError: cross_attention_dim must be specified for CrossAttnDownBlock2D

JPW0080 commented 11 months ago

After revisiting ControlVideo, here was the full Installation Procedure

Install Anaconda3 and Launch Anaconda3 Prompt

conda create -n controlvideo python=3.10.6 conda activate controlvideo mkdir \AI cd \AI git clone https://github.com/YBYBZhang/ControlVideo.git mkdir \AI\ControlVideo\output cd \AI\ControlVideo pip install https://download.pytorch.org/whl/cu116/torchvision-0.14.1%2Bcu116-cp310-cp310-win_amd64.whl pip install https://download.pytorch.org/whl/cu116/torch-1.13.1%2Bcu116-cp310-cp310-win_amd64.whl pip install git+https://github.com/openai/CLIP.git pip install matplotlib pip install mediapipe pip install positional-encodings

Now, minimize Anaconda3 Prompt

Locate the requirements.txt in the \AI\ControlVideo directory Delete these lines and save the file

clip==1.0 deepspeed==0.8.0 torch==1.13.1+cu116 torchvision==0.14.1+cu116

Refer to https://github.com/YBYBZhang/ControlVideo and download the needed "diffusers" weights as described.

Maximize Anaconda3 Prompt

pip install -r requirements.txt

python inference.py --prompt "Solid platinum, Statue of Liberty, moonwalk" --video_path ./data/moonwalk.mp4 --output_path ./output/ --condition depth_midas --video_length 15 --width 512 --height 512 --frame_rate 2 --smoother_steps 19 20 --version v10 --seed 2023

Disregard Triton Error, if all goes well, Inference should commence.

rB080 commented 10 months ago

Hi, I didn't exactly understand how to download the weights and getting the missing weight file error as expected. Can I wget the weights some way?