Open billium99 opened 1 year ago
Newest update training is also broken
most likely training process itself resulted in NaN values (like what happens when you choose extremely high learn rate), etc. yes, webui should display better error messages when that happens.
check the training log to confirm.
I started using commit version d8f8bcb821fa62e943eb95ee05b8a949317326fe for now it works good
I have the same issue, any idea
Vladmandic I think was correct in my case. Automatic continiues to work flawlessly except for my trainings. I can even deploy others' trainings and use them just fine, but something about my process creating my own pt file remains my problem. I didn't resolve it, but also haven't had time to try to start from scratch again. The first thing I'm going to try is fewer images. I was using 25, which then doubled with the mirror versions. I think 50 might be causing memory issues or something else I missed. Gonna watch the logs more closely as well.
I had this problem as well, tried changing the prompts, changing the images, reducing the images, changing the epochs, learning rates, you name it, I always got the bug again and again, right before displaying a rendering of current progress, after the loss started resulting in "NaN".
Then I realized I was training on a model OTHER THAN the stable diffusion default. i.e. when I switched back to 1.5 pruned ema only, everything worked, again and again.
I went back to training on another model (tried the deliberate 2 model), and the NaN bug showed up again.
In short, if you're having this issue, check which model you have loaded, it's a likely cause.
I had this problem as well, tried changing the prompts, changing the images, reducing the images, changing the epochs, learning rates, you name it, I always got the bug again and again, right before displaying a rendering of current progress, after the loss started resulting in "NaN".
Then I realized I was training on a model OTHER THAN the stable diffusion default. i.e. when I switched back to 1.5 pruned ema only, everything worked, again and again.
I went back to training on another model (tried the deliberate 2 model), and the NaN bug showed up again.
In short, if you're having this issue, check which model you have loaded, it's a likely cause.
This fixed it for me! So easy to overlook. Thanks
Is there an existing issue for this?
What happened?
After training an embedding pt file successfully, and placing it in the embeddings directory, I now get "ValueError: cannot convert float NaN to integer" when attempting to generate an image using my embedding word in the prompt.
This is an M1 Mac Studio with 32gb of RAM
Steps to reproduce the problem
Train on a folder of images (in my case there were 50 images) Here are the settings I used to train: { "datetime": "2023-01-16 23:25:01", "model_name": "v2-1_768-ema-pruned", "model_hash": "ad2a33c361", "num_of_dataset_images": 50, "num_vectors_per_token": 15, "embedding_name": "Robryde23", "learn_rate": "0.05:10,0.02:20,0.01:60,0.005:200,0.002:500,0.001:3000,0.0005", "batch_size": 22, "gradient_step": 2, "data_root": "/Volumes/LaCie/Robyns Wedding/Robyn Trainer Shots/New-Resized", "log_directory": "textual_inversion/2023-01-16/Robryde23", "training_width": 512, "training_height": 512, "steps": 3000, "clip_grad_mode": "disabled", "clip_grad_value": "0.1", "latent_sampling_method": "deterministic", "create_image_every": 50, "save_embedding_every": 50, "save_image_with_stored_embedding": true, "template_file": "/Users/williamhenderson/stable-diffusion-webui/textual_inversion_templates/custom_subject_filewords.txt", "initial_step": 5 }
Ran that and got a successfully trained message
Move the pt file to the stable-diffusion-web-ui/embeddings folder
Create a new prompt for a new image in txt to image using my "trigger" word for the embedding
Fails to create an image and returns the error: "ValueError: cannot convert float NaN to integer"
Create an image without the embedding prompt text and SD seems to be fully functional.
What should have happened?
I believe it should have rendered a new image using my training data because of the embedding trigger prompt text.
Commit where the problem happens
ff6a5bcec1ce25aa8f08b157ea957d764be23d8d
What platforms do you use to access UI ?
MacOS
What browsers do you use to access the UI ?
Google Chrome
Command Line Arguments
Additional information, context and logs
Log output from my failed image creation:
Textual inversion embeddings loaded(1): Robryde23-50 100%|███████████████████████████████████████████| 20/20 [00:49<00:00, 2.47s/it] Total progress: 100%|███████████████████████████| 20/20 [00:34<00:00, 1.74s/it] 100%|███████████████████████████████████████████| 20/20 [00:24<00:00, 1.22s/it] Total progress: 100%|███████████████████████████| 20/20 [00:19<00:00, 1.02it/s] Error completing request Arguments: ('task(stl5bm10q0zly1e)', 'a photograph of Robryde23-50', 'ugly, distorted face', 'None', 'None', 20, 0, False, False, 1, 1, 18, -1.0, -1.0, 0, 0, 0, False, 512, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 0, False, False, False, False, '', 1, '', 0, '', True, False, False) {} Traceback (most recent call last): File "/Users/williamhenderson/stable-diffusion-webui/modules/call_queue.py", line 56, in f res = list(func(*args, kwargs)) File "/Users/williamhenderson/stable-diffusion-webui/modules/call_queue.py", line 37, in f res = func(*args, *kwargs) File "/Users/williamhenderson/stable-diffusion-webui/modules/txt2img.py", line 52, in txt2img processed = process_images(p) File "/Users/williamhenderson/stable-diffusion-webui/modules/processing.py", line 479, in process_images res = process_images_inner(p) File "/Users/williamhenderson/stable-diffusion-webui/modules/processing.py", line 598, in process_images_inner c = get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, p.steps, cached_c) File "/Users/williamhenderson/stable-diffusion-webui/modules/processing.py", line 565, in get_conds_with_caching cache[1] = function(shared.sd_model, required_prompts, steps) File "/Users/williamhenderson/stable-diffusion-webui/modules/prompt_parser.py", line 205, in get_multicond_learned_conditioning learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps) File "/Users/williamhenderson/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning conds = model.get_learned_conditioning(texts) File "/Users/williamhenderson/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning c = self.cond_stage_model(c) File "/Users/williamhenderson/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/Users/williamhenderson/stable-diffusion-webui/modules/sd_hijack_clip.py", line 233, in forward embeddings_list = ", ".join([f'{name} [{embedding.checksum()}]' for name, embedding in used_embeddings.items()]) File "/Users/williamhenderson/stable-diffusion-webui/modules/sd_hijack_clip.py", line 233, in
embeddings_list = ", ".join([f'{name} [{embedding.checksum()}]' for name, embedding in used_embeddings.items()])
File "/Users/williamhenderson/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 83, in checksum
self.cached_checksum = f'{const_hash(self.vec.reshape(-1) 100) & 0xffff:04x}'
File "/Users/williamhenderson/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 80, in const_hash
r = (r 281 ^ int(v) * 997) & 0xFFFFFFFF
ValueError: cannot convert float NaN to integer