a-r-r-o-w / cogvideox-factory

Memory optimized finetuning scripts for CogVideoX using TorchAO and DeepSpeed
Apache License 2.0
425 stars 38 forks source link

Windows Errors: Running /training #81

Open oliverban opened 2 weeks ago

oliverban commented 2 weeks ago

System Info / 系統信息

CUDA 12.4 Python 3.12 TORCH 2.5.1+cu124 2x3090

Information / 问题信息

Is there any doc or anything like a step-guide for Windows training? I have installed all the reqs and everything but still getting error when I run training*.py, see below (cogfac is my conda environment):

(cogfac) C:\Users\Oliver\Documents\Github\cogvideox-factory>python training/cogvideox_text_to_video_lora.py
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
Traceback (most recent call last):
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\transformers\utils\import_utils.py", line 1778, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\importlib\__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\transformers\modeling_utils.py", line 59, in <module>
    from .quantizers import AutoHfQuantizer, HfQuantizer
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\transformers\quantizers\__init__.py", line 14, in <module>
    from .auto import AutoHfQuantizer, AutoQuantizationConfig
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\transformers\quantizers\auto.py", line 44, in <module>
    from .quantizer_torchao import TorchAoHfQuantizer
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\transformers\quantizers\quantizer_torchao.py", line 35, in <module>
    from torchao.quantization import quantize_
ImportError: cannot import name 'quantize_' from 'torchao.quantization' (C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\torchao\quantization\__init__.py). Did you mean: 'Quantizer'?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\utils\import_utils.py", line 853, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\importlib\__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\loaders\unet.py", line 46, in <module>
    from .lora_pipeline import LORA_WEIGHT_NAME, LORA_WEIGHT_NAME_SAFE, TEXT_ENCODER_NAME, UNET_NAME
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\loaders\lora_pipeline.py", line 36, in <module>
    from .lora_base import LoraBaseMixin
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\loaders\lora_base.py", line 44, in <module>
    from transformers import PreTrainedModel
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\transformers\utils\import_utils.py", line 1766, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\transformers\utils\import_utils.py", line 1780, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'quantize_' from 'torchao.quantization' (C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\torchao\quantization\__init__.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\utils\import_utils.py", line 853, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\importlib\__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\models\autoencoders\__init__.py", line 1, in <module>
    from .autoencoder_asym_kl import AsymmetricAutoencoderKL
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\models\autoencoders\autoencoder_asym_kl.py", line 23, in <module>
    from .vae import DecoderOutput, DiagonalGaussianDistribution, Encoder, MaskConditionDecoder
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\models\autoencoders\vae.py", line 25, in <module>
    from ..unets.unet_2d_blocks import (
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\models\unets\__init__.py", line 6, in <module>
    from .unet_2d import UNet2DModel
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\models\unets\unet_2d.py", line 24, in <module>
    from .unet_2d_blocks import UNetMidBlock2D, get_down_block, get_up_block
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\models\unets\unet_2d_blocks.py", line 36, in <module>
    from ..transformers.dual_transformer_2d import DualTransformer2DModel
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\models\transformers\__init__.py", line 13, in <module>
    from .prior_transformer import PriorTransformer
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\models\transformers\prior_transformer.py", line 9, in <module>
    from ...loaders import PeftAdapterMixin, UNet2DConditionLoadersMixin
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\utils\import_utils.py", line 843, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\utils\import_utils.py", line 855, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import diffusers.loaders.unet because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'quantize_' from 'torchao.quantization' (C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\torchao\quantization\__init__.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Oliver\Documents\Github\cogvideox-factory\training\cogvideox_text_to_video_lora.py", line 37, in <module>
    from diffusers import (
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\utils\import_utils.py", line 844, in __getattr__
    value = getattr(module, name)
            ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\utils\import_utils.py", line 843, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\diffusers\utils\import_utils.py", line 855, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import diffusers.models.autoencoders.autoencoder_kl_cogvideox because of the following error (look up to see its traceback):
Failed to import diffusers.loaders.unet because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'quantize_' from 'torchao.quantization' (C:\Users\Oliver\MiniConda3\envs\cogfac\Lib\site-packages\torchao\quantization\__init__.py)
KaptainSisay commented 2 weeks ago

Same here, but I can't install the requirements at all due to torchao 0.4.0 or newer aren't available for windows. Not sure if there's a workaround besides compiling torchao or running on Linux.

a-r-r-o-w commented 2 weeks ago

I am unsure about how to fix it, but this looks like an environment issue related to torchao installation. Could you try doing a clean install with USE_CPP=0 pip install --force-reinstall torchao?

I don't have a windows device to test on unfortunately but I've heard people have had success in doing so. If the latest torchao version does not work, could you try to install one of the older stable versions since we don't depend on some of their latest versions. Gentle ping to @Nojahhh in case he has encountered this since he's been super helpful with windows related issues