RuntimeError: Failed to import diffusers.models.autoencoders.autoencoder_kl - after installing Triton 2.1.0 for Windows

Zyin055 commented 6 months ago

When making a new install with version v23.0.15 and installing Triton 2.1.0 for Windows via the setup wizard step 3, I get an error when trying to train a LoRA when using a known working config file for both SD1.5 and SDXL.

This error only happens after installing Triton 2.1.0 for Windows. It worked fine when only doing steps 1 (install) and 2 (CuDNN files) in the setup wizard. Step 3 (Triton) is what broke it.

Windows 10 RTX 3060 12GB 48GB RAM Python 3.10.9 (had to upgrade from 3.10.6 for this)

PS what does Triton even do? Is it worth it for me to try and resolve this issue?

Traceback (most recent call last):
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\diffusers\utils\import_utils.py", line 710, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "C:\Users\Zyin\AppData\Local\Programs\Python\Python310\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\diffusers\models\autoencoders\__init__.py", line 1, in <module>
    from .autoencoder_asym_kl import AsymmetricAutoencoderKL
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_asym_kl.py", line 23, in <module>
    from .vae import DecoderOutput, DiagonalGaussianDistribution, Encoder, MaskConditionDecoder
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\diffusers\models\autoencoders\vae.py", line 24, in <module>
    from ..attention_processor import SpatialNorm
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\diffusers\models\attention_processor.py", line 32, in <module>
    import xformers.ops
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\xformers\ops\__init__.py", line 8, in <module>
    from .fmha import (
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 10, in <module>
    from . import attn_bias, cutlass, decoder, flash, small_k, triton, triton_splitk
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\xformers\ops\fmha\triton_splitk.py", line 21, in <module>
    if TYPE_CHECKING or _has_triton21():
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\xformers\ops\common.py", line 192, in _has_triton21
    if not _has_a_version_of_triton():
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\xformers\ops\common.py", line 176, in _has_a_version_of_triton
    import triton  # noqa: F401
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\__init__.py", line 8, in <module>
    from .runtime import (
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\runtime\__init__.py", line 1, in <module>
    from .autotuner import (Autotuner, Config, Heuristics, OutOfResources, autotune, heuristics)
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\runtime\autotuner.py", line 7, in <module>
    from ..testing import do_bench
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\testing.py", line 7, in <module>
    from . import language as tl
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\language\__init__.py", line 6, in <module>
    from .standard import (
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\language\standard.py", line 3, in <module>
    from ..runtime.jit import jit
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\runtime\jit.py", line 10, in <module>
    from ..runtime.driver import driver
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\runtime\driver.py", line 1, in <module>
    from ..backends import backends
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\backends\__init__.py", line 50, in <module>
    backends = _discover_backends()
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\backends\__init__.py", line 44, in _discover_backends
    driver = _load_module(name, os.path.join(root, name, 'driver.py'))
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\backends\__init__.py", line 12, in _load_module
    spec.loader.exec_module(module)
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\triton\backends\nvidia\driver.py", line 18, in <module>
    library_dir += [os.path.join(os.environ.get("CUDA_PATH"), "lib", "x64")]
  File "C:\Users\Zyin\AppData\Local\Programs\Python\Python310\lib\ntpath.py", line 104, in join
    path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\sd-scripts\sdxl_train_network.py", line 7, in <module>
    from library import sdxl_model_util, sdxl_train_util, train_util
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\sd-scripts\library\sdxl_model_util.py", line 7, in <module>
    from diffusers import AutoencoderKL, EulerDiscreteScheduler, UNet2DConditionModel
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\diffusers\utils\import_utils.py", line 701, in __getattr__
    value = getattr(module, name)
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\diffusers\utils\import_utils.py", line 700, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\diffusers\utils\import_utils.py", line 712, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback):
expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
  File "C:\Users\Zyin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Zyin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main    args.func(args)
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "C:\Stuff\AI\SD\kohya_ss v23.0.15\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Stuff\\AI\\SD\\kohya_ss v23.0.15\\venv\\Scripts\\python.exe', 'C:\\Stuff\\AI\\SD\\kohya_ss v23.0.15/sd-scripts/sdxl_train_network.py', '--bucket_no_upscale', '--bucket_reso_steps=128', '--cache_latents', '--caption_extension=.txt', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--gradient_checkpointing', '--learning_rate=1.0', '--logging_dir=C:\\Stuff\\AI\\SD\\kohya_ss Training\\LoRA\\SDXL\\mylora\\log', '--lr_scheduler=cosine', '--lr_scheduler_num_cycles=60', '--lr_warmup_steps=45', '--max_data_loader_n_workers=0', '--max_grad_norm=1', '--resolution=1024,1024', '--max_train_steps=2250', '--min_snr_gamma=5', '--mixed_precision=bf16', '--network_alpha=4', '--network_dim=8', '--network_dropout=0.2', '--network_module=networks.lora', '--no_half_vae', '--multires_noise_iterations=8', '--multires_noise_discount=0.35', '--optimizer_args', 'decouple=True', 'weight_decay=0.01', 'd_coef=1', 'use_bias_correction=True', 'safeguard_warmup=True', 'betas=0.9,0.99', '--optimizer_type=Prodigy', '--output_dir=C:\\Stuff\\AI\\SD\\kohya_ss Training\\LoRA\\SDXL\\mylora\\model', '--output_name=mylora', '--pretrained_model_name_or_path=C:/Stuff/AI/SD/shared/models/Stable-diffusion/03~sdxl/01~photoreal/juggernautXL_v9Rundiffusionphoto2.safetensors', '--save_every_n_epochs=2', '--save_model_as=safetensors', '--save_precision=bf16', '--scale_weight_norms=1', '--text_encoder_lr=1.0', '--train_batch_size=2', '--train_data_dir=C:\\Stuff\\AI\\SD\\kohya_ss Training\\LoRA\\SDXL\\mylora\\images', '--unet_lr=1.0', '--xformers', '--sample_sampler=k_dpm_2_a', '--sample_prompts=C:\\Stuff\\AI\\SD\\kohya_ss Training\\LoRA\\SDXL\\mylora\\model\\sample\\prompt.txt', '--sample_every_n_epochs=2']' returned non-zero exit status 1.

waifuista commented 6 months ago

Is the same thing happening to me, but I don't have this "triton" downloaded, if I download it can the problem be over?

michikora commented 6 months ago

Do you have CUDA Toolkit installed? If not try installing it. https://developer.nvidia.com/cuda-11.3.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exe_local I had the same issue and solved it by installing this.

waifuista commented 6 months ago

yes i have :/

zilvergrafix commented 6 months ago

the same issue here, I think Triton is the culprit?

reinstalling venv folder and option 1 again...
don't select triton install this time

DONE! is working and training now!

Screenshot 2024-03-24 120525

djp3k05 commented 6 months ago

same here. you just need to delete the "triton" folders, no need to reinstall all. ps: cuda is installed.

bmaltais commented 6 months ago

I will add a not about Triton... I added it as an option because some people complained about the error message... but apparently this custom Triton build does not help and actually make things worst...

efhosci commented 6 months ago

Was just struggling with this same problem, though in my case the traceback failed for "RuntimeError: Cannot find ptxas". Deleting the triton folders sees to have fixed it, at least for now.

Edit: Maybe my celebration was premature, now getting "CUDA out of memory" errors, PyTorch is filling up my memory for some reason.

Edit edit: Seems to be working now, possibly fixed by reinstalling CUDA and drivers again

Zyin055 commented 6 months ago

Edit: Maybe my celebration was premature, now getting "CUDA out of memory" errors, PyTorch is filling up my memory for some reason.

This happened to me during testing while trying to figure out the Triton issue, turned out I had the Dreambooth tab open instead of the LoRA tab

efhosci commented 6 months ago

Edit: Maybe my celebration was premature, now getting "CUDA out of memory" errors, PyTorch is filling up my memory for some reason.

This happened to me during testing while trying to figure out the Triton issue, turned out I had the Dreambooth tab open instead of the LoRA tab

Oh shoot, that might have been my problem too, I don't remember if I switched to the LoRA tab after reloading everything else.

bmaltais / kohya_ss

RuntimeError: Failed to import diffusers.models.autoencoders.autoencoder_kl - after installing Triton 2.1.0 for Windows #2150