bmaltais / kohya_ss

Apache License 2.0
9.49k stars 1.22k forks source link

ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU #1638

Closed Pokilokui closed 8 months ago

Pokilokui commented 11 months ago

tried to set all in lower case to gpu select but does not work

Traceback (most recent call last): File "D:\AI\Kohya\kohya_ss\train_network.py", line 1009, in trainer.train(args) File "D:\AI\Kohya\kohya_ss\train_network.py", line 232, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 251, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 244, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 203, in set_use_memory_efficient_attention_xformers raise ValueError( ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU Traceback (most recent call last): File "C:\Users\Elouan\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Elouan\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\AI\Kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "D:\AI\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\AI\Kohya\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=D:/AI/Kohya/kunaboto/kunaboto style/image', '--resolution=512,512', '--output_dir=D:/AI/Kohya/kunaboto/kunaboto style/model', '--logging_dir=D:/AI/Kohya/kunaboto/kunaboto style/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=128', '--output_name=Kunaboto style', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1300', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

1464206376 commented 11 months ago

Check if 'all' was entered as' a 'or' ALL 'during installation, which is not possible and must be typed in lowercase

sammcj commented 11 months ago

Came here to log this as well, it occurs with the docker image which used to work fine but now doesn't, it looks like transformers was installed with a version that wasn't built with CUDA support.

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.1.0+cu121)
    Python  3.10.11 (you have 3.10.13)
...
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 2, 1, 40) (torch.float32)
     key         : shape=(1, 2, 1, 40) (torch.float32)
     value       : shape=(1, 2, 1, 40) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`flshattF` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
    Operator wasn't built - see `python -m xformers.info` for more info
`tritonflashattF` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
    requires A100 GPU
    Only work on pre-MLIR triton for now
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    Operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    xFormers wasn't build with CUDA support
    max(query.shape[-1] != value.shape[-1]) > 32
    Operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 40
sammcj commented 11 months ago

It also looks like tensorflow was built for CPU only but without any AVX acceleration:

This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.

To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
h3clikejava commented 11 months ago

I encountered the same problem, suddenly appeared. How to solve?

sarojkumarss commented 11 months ago

[Dataset 0] loading image sizes. 100%|█████████████████████████████████████████████████████████████████████████████| 2626/2626 [00:13<00:00, 188.47it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (320, 640), count: 100 bucket 1: resolution (384, 512), count: 400 bucket 2: resolution (384, 576), count: 1000 bucket 3: resolution (384, 640), count: 200 bucket 4: resolution (448, 576), count: 100 bucket 5: resolution (512, 384), count: 200 bucket 6: resolution (512, 448), count: 100 bucket 7: resolution (512, 512), count: 2600 bucket 8: resolution (576, 384), count: 400 bucket 9: resolution (576, 448), count: 100 mean ar error (without repeats): 0.00021692813305737463 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: D:/Lora Training/realisticVisionV51_v20Novae.safetensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: loading text encoder: Enable xformers for U-Net Traceback (most recent call last): File "D:\Kohya\kohya_ss\train_network.py", line 1009, in trainer.train(args) File "D:\Kohya\kohya_ss\train_network.py", line 232, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "D:\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 251, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "D:\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 244, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "D:\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 203, in set_use_memory_efficient_attention_xformers raise ValueError( ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU Traceback (most recent call last): File "C:\Users\saroj\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\saroj\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\Kohya\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=D:/Lora Training/realisticVisionV51_v20Novae.safetensors', '--train_data_dir=D:/Lora Training/Lora Training/img', '--reg_data_dir=D:/Lora Training/Lora Training/reg', '--resolution=512,512', '--output_dir=D:/Lora Training/Lora Training/model', '--logging_dir=D:/Lora Training/Lora Training/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=Sw3tha_T_1', '--lr_scheduler_num_cycles=4', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=10400', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

I have the same issue. ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU

Can anyone help us on this issue??

sarojkumarss commented 11 months ago

Got the solution from another post. run 'accelerate config' in powershell When asking 'what gpu to use' part type in 'all'. ('ALL' in lower-case)

Shark1337 commented 10 months ago

I'm in the same situation with WSL, I've tried everything and nothing works, it's strange and I don't quite understand why this is happening...

heartlocket commented 9 months ago

Seems broken. Anyone else know a decent lora alternative with a well-maintained repo?

segalinc commented 8 months ago

made it work adding --gpu_ids=0 at line 724 of lora_gui.py

ZhangEnsure commented 6 months ago

made it work adding --gpu_ids=0 at line 724 of lora_gui.py

Hello,Could you show your code?thank you thankyou!!!

spikers commented 5 months ago

The error apparently is saying that the specific "xformers" feature's memory efficient attention is only available for Nvidia GPUs. I use an AMD GPU, so I just disabled it. I did that by: Under "Advanced", you'll see "CrossAttention" and it's "xformers" by default. I set that to "none".

For me, it also complained about fp16 or something. Up on top, under "Accelerate Launch" under "Mixed precision", use "no", I think that forces it to use full 32-bit precision. I was able to CPU train a model last night and it gave me a .safetensors file, but it took forever (6h) and the results were awful, so take that with a grain of salt.

Also, there's a few required fields. You'll at least need Model > Image folder (containing training images subfolders) and Folders > Output directory for trained model