Create new virtual environment with python -m venv venv
Activate through \venv\scripts\activate.bat
Install requirements file using python install -r requirements.txt
Errors change depending on what workarounds are used.
Using above steps:
❌ Error: "Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops." error from DeepSpeed (It appears to be a problem on DeepSpeed's end but there's no known solution)
Using steps found on https://huggingface.co/THUDM/CogVideoX-5b-I2V
⚠️ Error: "URLs must start with http://". This version of diffusers doesn't support local files.
✔️ Resolved by running localhost server.
❌ New error:
File "D:\AI\CogVideo\venv\lib\site-packages\transformers\utils\import_utils.py", line 1639, in requires_backends
raise ImportError("".join(failed))
ImportError:
T5Tokenizer requires the SentencePiece library but it was not found in your environment.
Install attempt of requirements again with no-deps
❌ pip install -r requirements.txt --no-deps results in removal/lack of cuda.
✔️ Resolved by running: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
Looking in indexes: https://download.pytorch.org/whl/cu117
⚠️ Red messages for "swissarmytransformer 0.4.12 requires boto3, which is not installed."
✔️ Script begins running, downloads resources, runs generation (~12 minutes)
❌ Error:
Traceback (most recent call last):
File "D:\AI\CogVideo\my_test.py", line 18, in <module>
video = pipe(
File "D:\AI\CogVideo\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\pipelines\cogvideo\pipeline_cogvideox_image2video.py", line 826, in __call__
video = self.decode_latents(latents)
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\pipelines\cogvideo\pipeline_cogvideox_image2video.py", line 406, in decode_latents
frames = self.vae.decode(latents).sample
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 1278, in decode
decoded = self._decode(z).sample
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 1235, in _decode
return self.tiled_decode(z, return_dict=return_dict)
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 1431, in tiled_decode
tile, conv_cache = self.decoder(tile, conv_cache=conv_cache)
File "D:\AI\CogVideo\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 963, in forward
hidden_states, new_conv_cache["mid_block"] = self.mid_block(
File "D:\AI\CogVideo\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 529, in forward
hidden_states, new_conv_cache[conv_cache_key] = resnet(
File "D:\AI\CogVideo\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 291, in forward
hidden_states, new_conv_cache["norm1"] = self.norm1(hidden_states, zq, conv_cache=conv_cache.get("norm1"))
File "D:\AI\CogVideo\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AI\CogVideo\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 177, in forward
z_first = F.interpolate(z_first, size=f_first_size)
File "D:\AI\CogVideo\venv\lib\site-packages\torch\nn\functional.py", line 3933, in interpolate
return torch._C._nn.upsample_nearest3d(input, output_size, scale_factors)
RuntimeError: "upsample_nearest3d_out_frame" not implemented for 'BFloat16'
❌ No output file.
Expected behavior / 期待表现
In a new virtual environment pip install -r requirements.txt should install correct dependencies and allow for use of any default script or script found on HuggingFace
I would like to add:
I was able to get this to work after installing CUDA/cu121. Regardless the dependencies should be looked if there is no "stable" branch.
System Info / 系統信息
Windows 10 NVIDIA RTX 4090 Python 3.10.7
Information / 问题信息
Reproduction / 复现过程
Steps to reproduce:
Errors change depending on what workarounds are used.
Using above steps: ❌ Error: "Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops." error from DeepSpeed (It appears to be a problem on DeepSpeed's end but there's no known solution)
Using steps found on https://huggingface.co/THUDM/CogVideoX-5b-I2V ⚠️ Error: "URLs must start with
http://
". This version of diffusers doesn't support local files. ✔️ Resolved by running localhost server. ❌ New error:Install attempt of requirements again with no-deps ❌ pip install -r requirements.txt --no-deps results in removal/lack of cuda. ✔️ Resolved by running: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 Looking in indexes: https://download.pytorch.org/whl/cu117 ⚠️ Red messages for "swissarmytransformer 0.4.12 requires boto3, which is not installed." ✔️ Script begins running, downloads resources, runs generation (~12 minutes) ❌ Error:
❌ No output file.
Expected behavior / 期待表现
In a new virtual environment
pip install -r requirements.txt
should install correct dependencies and allow for use of any default script or script found on HuggingFace