[X] I have searched the existing issues and checked the recent builds/commits
What happened?
PR #1795 added support for variable learning rate, but when i try to use it to embedding training,
it does get identified correctly and each change is printed on console, but it still always results in NaN errors:
for example: learn_rate = "0.005:50, 0.001:100, 0.0005:500"
it always ends up with NaN error at step 500 (first "checkpoint") regardless of any values i may put in
[Epoch 12: 20/40] loss: nan: 25% | 499/2000 [08:39<25:13, 1.01s/it]
Traceback (most recent call last):
File "/home/vlado/branches/automatic/modules/textual_inversion/textual_inversion.py", line 397, in train_embedding
processed = processing.process_images(p)
File "/home/vlado/branches/automatic/modules/processing.py", line 464, in process_images
res = process_images_inner(p)
File "/home/vlado/branches/automatic/modules/processing.py", line 557, in process_images_inner
c = prompt_parser.get_multicond_learned_conditioning(shared.sd_model, prompts, p.steps)
File "/home/vlado/branches/automatic/modules/prompt_parser.py", line 203, in get_multicond_learned_conditioning
learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps)
File "/home/vlado/branches/automatic/modules/prompt_parser.py", line 138, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/home/vlado/branches/automatic/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/home/vlado/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/vlado/branches/automatic/modules/sd_hijack_clip.py", line 184, in forward
batch_multipliers, remade_batch_tokens, used_custom_terms, hijack_comments, hijack_fixes, token_count = self.process_text(text)
File "/home/vlado/branches/automatic/modules/sd_hijack_clip.py", line 102, in process_text
remade_tokens, fixes, multipliers, current_token_count = self.tokenize_line(line, used_custom_terms, hijack_comments)
File "/home/vlado/branches/automatic/modules/sd_hijack_clip.py", line 77, in tokenize_line
used_custom_terms.append((embedding.name, embedding.checksum()))
File "/home/vlado/branches/automatic/modules/textual_inversion/textual_inversion.py", line 52, in checksum
self.cached_checksum = f'{const_hash(self.vec.reshape(-1) * 100) & 0xffff:04x}'
File "/home/vlado/branches/automatic/modules/textual_inversion/textual_inversion.py", line 49, in const_hash
r = (r * 281 ^ int(v) * 997) & 0xFFFFFFFF
ValueError: cannot convert float NaN to integer
using fixed learning rate like learn_rate = "0.005" works without issues
Steps to reproduce the problem
Go to Train -> Train -> Train Embedding
Change Embedding Learning Rate field to anything that includes variable rate
Is there an existing issue for this?
What happened?
PR #1795 added support for variable learning rate, but when i try to use it to embedding training,
it does get identified correctly and each change is printed on console, but it still always results in NaN errors:
for example:
learn_rate = "0.005:50, 0.001:100, 0.0005:500"
it always ends up with NaN error at step 500 (first "checkpoint") regardless of any values i may put in
using fixed learning rate like
learn_rate = "0.005"
works without issuesSteps to reproduce the problem
What should have happened?
Training should complete without errors
Commit where the problem happens
685f9631b56ff8bd43bce24ff5ce0f9a0e9af490
What platforms do you use to access UI ?
Windows, Linux
What browsers do you use to access the UI ?
Google Chrome, Microsoft Edge
Command Line Arguments
No response
Additional information, context and logs
No response