bmaltais / kohya_ss

Apache License 2.0
9.73k stars 1.25k forks source link

RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.) #2720

Open oracle9i88 opened 3 months ago

oracle9i88 commented 3 months ago

RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

eftSharptooth commented 3 months ago

This is a pytorch error, you most likely are using a 2080ti or lower GPU? It would also help to post the full log from the terminal window, as well as a brief description of what type of training you are attempting (Lora, Finetune) for (SD1.5, SDXL, FLUX).

maxanier commented 2 months ago
6ffb08b95e23 FLUX: Gradient checkpointing enabled.
6ffb08b95e23 prepare optimizer, data loader etc.
6ffb08b95e23 enable fp8 training for U-Net.
6ffb08b95e23 enable fp8 training for Text Encoder.
6ffb08b95e23 running training / 学習開始
6ffb08b95e23   num train images * repeats / 学習画像の数×繰り返し回数: 720
6ffb08b95e23   num reg images / 正則化画像の数: 0
6ffb08b95e23   num batches per epoch / 1epochのバッチ数: 720
6ffb08b95e23   num epochs / epoch数: 3
6ffb08b95e23   batch size per device / バッチサイズ: 1
6ffb08b95e23   gradient accumulation steps / 勾配を合計するステップ数 = 1
6ffb08b95e23   total optimization steps / 学習ステップ数: 1600
steps:   0%|          | 0/1600 [00:00<?, ?it/s]2024-09-15 19:50:08 INFO     text_encoder is not needed for training. deleting train_network.py:1033
6ffb08b95e23                              to save memory.
 unet dtype: torch.float8_e4m3fn, device: cuda:0   train_network.py:1053
6ffb08b95e23                     INFO     epoch is incremented. current_epoch: 0, epoch: 1      train_util.py:672
6ffb08b95e23 
6ffb08b95e23 epoch 1/3
6ffb08b95e23 Traceback (most recent call last):
6ffb08b95e23   File "/app/sd-scripts/flux_train_network.py", line 520, in <module>
6ffb08b95e23     trainer.train(args)
6ffb08b95e23   File "/app/sd-scripts/train_network.py", line 1178, in train
6ffb08b95e23     accelerator.backward(loss)
6ffb08b95e23   File "/home/1000/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 2155, in backward
6ffb08b95e23     self.scaler.scale(loss).backward(**kwargs)
6ffb08b95e23   File "/home/1000/.local/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
6ffb08b95e23     torch.autograd.backward(

6ffb08b95e23   File "/home/1000/.local/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
6ffb08b95e23     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
6ffb08b95e23 RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

steps:   0%|          | 0/1600 [00:04<?, ?it/s]
6ffb08b95e23 Traceback (most recent call last):
6ffb08b95e23   File "/home/1000/.local/bin/accelerate", line 8, in <module>
6ffb08b95e23     sys.exit(main())
6ffb08b95e23   File "/home/1000/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
6ffb08b95e23     args.func(args)
6ffb08b95e23   File "/home/1000/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
6ffb08b95e23     simple_launcher(args)
6ffb08b95e23   File "/home/1000/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
6ffb08b95e23     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
6ffb08b95e23 subprocess.CalledProcessError: Command '['/usr/local/bin/python', '/app/sd-scripts/flux_train_network.py', '--config_file', '/app/data/MaxTraining/model/config_lora-20240915-194903.toml']' returned non-zero exit status 

I am facing the same issue (at least based on initial description). I am trying to train a FLUX Lora on a RTX 2060 Super on Arch Linux (via Docker). I managed to not run out of memory so far, but training ends with that error. My config: flux_lora_4.json

@eftSharptooth Are 2080ti or lower not supported by Pytorch? My compute Capability is supposed to be 7.5, but I don't know if that is enough. I was able to train SDXL though. But for that it was using torch=2.1.2+cu118 instead of torch==2.4.0+cu124, so either Flux training does require more features or the new torch version changed something internally here.

nvidia-smi
Sun Sep 15 22:07:06 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4 

Edit: From https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ it seems like the Turing achitecture is described by SM75. So yes, it is not SM80 or SM90.

maxanier commented 2 months ago

Also happens with torch==2.1.2+cu118 torchvision==0.16.2+cu118 xformers==0.0.23.post1+cu118

maxanier commented 2 months ago

Workaround by @chenxluo here: https://github.com/bmaltais/kohya_ss/issues/2717#issuecomment-2366769178 Works for me on 2060 Super (Although training ultimately has no effect, but I don't yet know what is causing that)