Open NaughtDZ opened 1 year ago
got the same in step 149 and then in step 449
After comparing https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/602a1864b05075ca4283986e6f5c7d5bce864e11 and https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/d8f8bcb821fa62e943eb95ee05b8a949317326fe (The latter can be used normally without error.) The files in error report: \modules\sd_hijack_clip.py" The two versions are identical \venv\lib\site-packages\torch\nn\modules\module.py Large number of inconsistencies \modules\textual_inversion\textual_inversion.py Large number of inconsistencies
So...The wrong problem may come from the version change of torch or textual_inversion?
got the same issue in step 199 tho
Got the same value at step 199 as well
Same build 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8
installed on a ubuntu (focal) machine and on windows 11.
I get this error only on the linux machine.
Using 3000
steps and this embedded learning rate 0.05:10, 0.02:20, 0.01:60, 0.005:200, 0.002:500, 0.001:3000, 0.0005
it only errors when it gets to 0.001:3000.
Extra data point. I moved a 3070 from the linux machine to the Windows machine, and I now get the error on the Windows machine.
I think I may have found a solution for that problem on my Mac m1, I kept getting that error at step 499, I looked at the "Save an image to log directory every N steps, 0 to disable" and the "Save a copy of embedding to log directory every N steps, 0 to disable" they were both set at 500. I switched them to above the maximum steps so it wouldnt trigger and so far so good. Still training but past the normal error point
Is there an existing issue for this?
What happened?
Training Embedding at such 999or2999 step,will show up:
raceback (most recent call last): File "I:\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 479, in train_embedding c = shared.sd_model.cond_stage_model(batch.cond_text) File "I:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "I:\stable-diffusion-webui\modules\sd_hijack_clip.py", line 233, in forward embeddings_list = ", ".join([f'{name} [{embedding.checksum()}]' for name, embedding in used_embeddings.items()]) File "I:\stable-diffusion-webui\modules\sd_hijack_clip.py", line 233, in
embeddings_list = ", ".join([f'{name} [{embedding.checksum()}]' for name, embedding in used_embeddings.items()])
File "I:\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 84, in checksum
self.cached_checksum = f'{const_hash(self.vec.reshape(-1) 100) & 0xffff:04x}'
File "I:\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 81, in const_hash
r = (r 281 ^ int(v) 997) & 0xFFFFFFFF
ValueError: cannot convert float NaN to integer
Steps to reproduce the problem
What should have happened?
Embedding training works fine
Commit where the problem happens
602a1864b05075ca4283986e6f5c7d5bce864e11
What platforms do you use to access UI ?
Windows
What browsers do you use to access the UI ?
Google Chrome
Command Line Arguments
No response
Additional information, context and logs
No response