AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
142.17k stars 26.84k forks source link

[Bug]: Textual inversion does not work on RTX 3060 V2 #8088

Closed plaidpants closed 1 year ago

plaidpants commented 1 year ago

Is there an existing issue for this?

What happened?

I replaced my RTX 2070 SUPER with 8GB of VRAM with an RTX 3060 V2 with 12 GB of VRAM in my machine and textual inversion does not seem to work with the exact same settings and A1111 build, I even whacked my venv and retried just to be sure. With the same settings and training data that work fine on the 2060 I get this error below on the 3060 and the training appears to trend away from the target subject instead of closer before is exits with this error.

Training at rate of 0.05 until step 2
Preparing dataset...
100%|██████████████████████████████████████████| 84/84 [00:05<00:00, 16.29it/s]
No saved optimizer exists in checkpoint
Training at rate of 0.02 until step 4
Training at rate of 0.01 until step 10
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.49it/s]
Training at rate of 0.005 until step 40
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.57it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  6.70it/s]
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.63it/s]
Training at rate of 0.002 until step 100
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  6.92it/s]
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.34it/s]
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.56it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  6.68it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  6.76it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  6.75it/s]
Training at rate of 0.001 until step 600
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.57it/s]
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.57it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  6.68it/s]
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.65it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  6.72it/s]
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  6.48it/s]
Training textual inversion [Epoch 28: 2/6] loss: nan:   0%| | 169/100000 [05:42T
raceback (most recent call last):██████████████| 20/20 [00:03<00:00,  6.51it/s]
  File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p
y", line 531, in train_embedding
    save_embedding(embedding, optimizer, checkpoint, embedding_name_every, last_
saved_file, remove_cached_checksum=True)
  File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p
y", line 651, in save_embedding
    embedding.save(filename)
  File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p
y", line 69, in save
    'hash': self.checksum(),
  File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p
y", line 84, in checksum
    self.cached_checksum = f'{const_hash(self.vec.reshape(-1) * 100) & 0xffff:04
x}'
  File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p
y", line 81, in const_hash
    r = (r * 281 ^ int(v) * 997) & 0xFFFFFFFF
ValueError: cannot convert float NaN to integer

Steps to reproduce the problem

  1. Install on RTX 3060 V2 with 12GB of VRAM
  2. setup a textural inversion training session
  3. click train

What should have happened?

training should occur and move closer to the target subject with the sample image generated during the training.

Commit where the problem happens

Unsure, was working on my 2070 up until a replaced it with the 3060 on 2022-FEB-23, continues to work with my 2070 when I swapped it back.

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

set COMMANDLINE_ARGS= --xformers

List of extensions

Extension | URL | Version | Update -- | -- | -- | -- gif2gif | https://github.com/LonicaMewinsky/gif2gif.git | 5d60006a (Thu Feb 23 21:49:51 2023) | unknown sd-dynamic-prompts | https://github.com/adieyal/sd-dynamic-prompts | 9a38f02e (Thu Feb 16 19:39:54 2023) | unknown sd-webui-additional-networks | https://github.com/kohya-ss/sd-webui-additional-networks.git | 822f2136 (Thu Feb 16 12:57:55 2023) | unknown sd-webui-controlnet | https://github.com/Mikubill/sd-webui-controlnet | d4573603 (Thu Feb 23 11:55:52 2023) | unknown sd-webui-riffusion | https://github.com/enlyth/sd-webui-riffusion | 044ff5fa (Wed Jan 25 13:24:31 2023) | unknown sd_dreambooth_extension | https://github.com/d8ahazard/sd_dreambooth_extension | 43ae9d55 (Sat Feb 11 20:49:57 2023) | unknown sdweb-merge-board | https://github.com/bbc-mc/sdweb-merge-board.git | f59c60fd (Wed Jan 25 15:00:09 2023) | unknown stable-diffusion-webui-aesthetic-gradients | https://github.com/AUTOMATIC1111/stable-diffusion-webui-aesthetic-gradients | 2624e5dd (Fri Jan 6 10:59:30 2023) | unknown stable-diffusion-webui-depthmap-script | https://github.com/thygate/stable-diffusion-webui-depthmap-script.git | 189e30ad (Mon Feb 6 12:13:59 2023) | unknown stable-diffusion-webui-instruct-pix2pix | https://github.com/Klace/stable-diffusion-webui-instruct-pix2pix.git | a5a4c6b8 (Sun Feb 5 22:19:05 2023) | unknown stable-diffusion-webui-wildcards | https://github.com/AUTOMATIC1111/stable-diffusion-webui-wildcards | 6ed81ed1 (Sat Oct 29 16:18:48 2022) | unknown ultimate-upscale-for-automatic1111 | https://github.com/Coyote-A/ultimate-upscale-for-automatic1111.git | 0a3d03a4 (Tue Feb 7 06:07:23 2023) | unknown LDSR | built-in |   |   Lora | built-in |   |   ScuNET | built-in |   |   SwinIR | built-in |   |   prompt-bracket-checker | built-in |   |   Extension URL Version Update gif2gif https://github.com/LonicaMewinsky/gif2gif.git 5d60006a (Thu Feb 23 21:49:51 2023) unknown sd-dynamic-prompts https://github.com/adieyal/sd-dynamic-prompts 9a38f02e (Thu Feb 16 19:39:54 2023) unknown sd-webui-additional-networks https://github.com/kohya-ss/sd-webui-additional-networks.git 822f2136 (Thu Feb 16 12:57:55 2023) unknown sd-webui-controlnet https://github.com/Mikubill/sd-webui-controlnet d4573603 (Thu Feb 23 11:55:52 2023) unknown sd-webui-riffusion https://github.com/enlyth/sd-webui-riffusion 044ff5fa (Wed Jan 25 13:24:31 2023) unknown sd_dreambooth_extension https://github.com/d8ahazard/sd_dreambooth_extension 43ae9d55 (Sat Feb 11 20:49:57 2023) unknown sdweb-merge-board https://github.com/bbc-mc/sdweb-merge-board.git f59c60fd (Wed Jan 25 15:00:09 2023) unknown stable-diffusion-webui-aesthetic-gradients https://github.com/AUTOMATIC1111/stable-diffusion-webui-aesthetic-gradients 2624e5dd (Fri Jan 6 10:59:30 2023) unknown stable-diffusion-webui-depthmap-script https://github.com/thygate/stable-diffusion-webui-depthmap-script.git 189e30ad (Mon Feb 6 12:13:59 2023) unknown stable-diffusion-webui-instruct-pix2pix https://github.com/Klace/stable-diffusion-webui-instruct-pix2pix.git a5a4c6b8 (Sun Feb 5 22:19:05 2023) unknown stable-diffusion-webui-wildcards https://github.com/AUTOMATIC1111/stable-diffusion-webui-wildcards 6ed81ed1 (Sat Oct 29 16:18:48 2022) unknown ultimate-upscale-for-automatic1111 https://github.com/Coyote-A/ultimate-upscale-for-automatic1111.git 0a3d03a4 (Tue Feb 7 06:07:23 2023) unknown LDSR [built-in](http://127.0.0.1:7860/) Lora [built-in](http://127.0.0.1:7860/) ScuNET [built-in](http://127.0.0.1:7860/) SwinIR [built-in](http://127.0.0.1:7860/) prompt-bracket-checker [built-in](http://127.0.0.1:7860/) ### Console logs ```Shell Training at rate of 0.05 until step 2 Preparing dataset... 100%|██████████████████████████████████████████| 84/84 [00:05<00:00, 16.29it/s] No saved optimizer exists in checkpoint Training at rate of 0.02 until step 4 Training at rate of 0.01 until step 10 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.49it/s] Training at rate of 0.005 until step 40 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.57it/s] 100%|██████████████████████████████████████████| 20/20 [00:02<00:00, 6.70it/s] 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.63it/s] Training at rate of 0.002 until step 100 100%|██████████████████████████████████████████| 20/20 [00:02<00:00, 6.92it/s] 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.34it/s] 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.56it/s] 100%|██████████████████████████████████████████| 20/20 [00:02<00:00, 6.68it/s] 100%|██████████████████████████████████████████| 20/20 [00:02<00:00, 6.76it/s] 100%|██████████████████████████████████████████| 20/20 [00:02<00:00, 6.75it/s] Training at rate of 0.001 until step 600 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.57it/s] 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.57it/s] 100%|██████████████████████████████████████████| 20/20 [00:02<00:00, 6.68it/s] 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.65it/s] 100%|██████████████████████████████████████████| 20/20 [00:02<00:00, 6.72it/s] 100%|██████████████████████████████████████████| 20/20 [00:03<00:00, 6.48it/s] Training textual inversion [Epoch 28: 2/6] loss: nan: 0%| | 169/100000 [05:42T raceback (most recent call last):██████████████| 20/20 [00:03<00:00, 6.51it/s] File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p y", line 531, in train_embedding save_embedding(embedding, optimizer, checkpoint, embedding_name_every, last_ saved_file, remove_cached_checksum=True) File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p y", line 651, in save_embedding embedding.save(filename) File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p y", line 69, in save 'hash': self.checksum(), File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p y", line 84, in checksum self.cached_checksum = f'{const_hash(self.vec.reshape(-1) * 100) & 0xffff:04 x}' File "T:\stable-diffusion-webui4\modules\textual_inversion\textual_inversion.p y", line 81, in const_hash r = (r * 281 ^ int(v) * 997) & 0xFFFFFFFF ValueError: cannot convert float NaN to integer ``` ``` ### Additional information _No response_
ClashSAN commented 1 year ago

see if you can manually install C43H66N12O12S2's build of xformers, the recent xformers version breaks training. or just remove xformers from args entirely, see if that works.

plaidpants commented 1 year ago

Swapped back to the 3060. Removing the COMMANDLINE_ARGS= --xformers appears to re-enable the textual inversion training functionality, it does not crash and appears to train correctly now. Batch size is limited to 4 instead the original 7 on the 2070 8GB or 21 on the 3060 12GB. Not sure how to point A1111 at the other version of xformers from @C43H66N12O12S2 but I will investigate.