Open Rudy34160 opened 1 year ago
Out of VRAM
Out of VRAM
I had deduced the same type of error from the message. But it's still surprising on Google Collab, right? I have never encountered this type of problem until now while training my textual inversions. One of the parameters that I modified would be incorrectly adjusted and would consume too much Vram? Which ? 🤔
Out of VRAM
I had deduced the same type of error from the message. But it's still surprising on Google Collab, right? I have never encountered this type of problem until now while training my textual inversions. One of the parameters that I modified would be incorrectly adjusted and would consume too much Vram? Which ? 🤔
at first glance I'd say the learning rate. 0.9 looks extremely high to me, also... is 10 your batch size?
Rb
Yes, 10 in batch size: Never had a crash with this setting. So indeed, it could come from the learning rate. Despite my many readings on this subject, I have not yet found any details on this subject: There is a limit? (I thought we were playing on a range between 0 and 1). Without further information, I fumble with empirical tests...
hmm well colab has been fickle for the last day or so, so I don't know. I am also just learning but that was what stuck out to me as I have just never seen anyone do a learning rate of 0,9. mostly the ones I see are in the triple decimals but I don't know if this would affect the vram usage... unless it's like telling it to read the entirety of ten books at once and then do an oral bok review.
edit. just tried a run myself and for me colab/sd consistently crashes on any batch larger than two, and two is a fifty fifty chance it nosedives into a cuda vram crash
> [...] I have just never seen anyone do a learning rate of 0,9. That's why I want to test for myself. Having found no text on this subject, I want to see what it can give..
> edit. just tried a run myself and for me colab/sd consistently crashes on any batch larger than two, and two is a fifty fifty chance it nosedives into a cuda vram crash I do not master enough Colab to determine the cause of this type of crash... Someone more expert can perhaps enlighten us? Could just be a temporary technical fault
I''m guessing colab is doing something behind the curtains, becvause it has been twitchy these last 24hrs
> edit. just tried a run myself and for me colab/sd consistently crashes on any batch larger than two, and two is a fifty fifty chance it nosedives into a cuda vram crash
No way... Same error on each of my attempts.... 😥 We'll wait for it to work again...
News from the front?
Anyone have another Colab to recommend for training embeddings while waiting for this to recover?
I''m guessing colab is doing something behind the curtains, becvause it has been twitchy these last 24hrs
Do you know where you can find a good tutorial to try to create your own version of SD Colab while waiting ?
Try Runpod notebooks, they might not crash due to high VRAM req
Try Runpod notebooks, they might not crash due to high VRAM req
Not free for my test ..😁😉 Mais en attendant du mieux, j'exécute celui-ci : https://github.com/camenduru/stable-diffusion-webui-colab. They might not crash due to high VRAM.
Since last night, I have the error below when I launch a training. Never had a problem until now... Any idea what I'm doing wrong?
Training at rate of 0.9 until step 1000 Preparing dataset... 50% 10/20 [00:05<00:05, 1.89it/s] Error completing request Arguments: ('task(dvl5eia1y9q5fxe)', 'testai01', '0.9', 10, 1, '/content/gdrive/MyDrive/sd/txt/', 'textual_inversion', 512, 512, True, 1000, 'disabled', '0.1', False, 0, 'deterministic', 20, 20, 'humansubject_filewords.txt', True, True, 'portrait', '', 35, 14, 7, -1.0, 512, 512) {} Traceback (most recent call last): File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/call_queue.py", line 56, in f res = list(func(*args, kwargs)) File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/call_queue.py", line 37, in f res = func(*args, kwargs) File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/textual_inversion/ui.py", line 33, in train_embedding embedding, filename = modules.textual_inversion.textual_inversion.train_embedding(args) File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 413, in train_embedding ds = modules.textual_inversion.dataset.PersonalizedBase(data_root=data_root, width=training_width, height=training_height, repeats=shared.opts.training_image_repeats_per_epoch, placeholder_token=embedding_name, model=shared.sd_model, cond_model=shared.sd_model.cond_stage_model, device=devices.device, template_file=template_file, batch_size=batch_size, gradient_step=gradient_step, shuffle_tags=shuffle_tags, tag_drop_out=tag_drop_out, latent_sampling_method=latent_sampling_method, varsize=varsize) File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/textual_inversion/dataset.py", line 88, in init latent_dist = model.encode_first_stage(torchdata.unsqueeze(dim=0)) File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in
setattr(resolved_obj, func_path[-1], lambda args, kwargs: self(*args, kwargs))
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/sd_hijack_utils.py", line 28, in call
return self.orig_func(*args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, *kwargs)
File "/content/gdrive/MyDrive/sd/stablediffusion/ldm/models/diffusion/ddpm.py", line 830, in encode_first_stage
return self.first_stage_model.encode(x)
File "/content/gdrive/MyDrive/sd/stablediffusion/ldm/models/autoencoder.py", line 83, in encode
h = self.encoder(x)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, kwargs)
File "/content/gdrive/MyDrive/sd/stablediffusion/ldm/modules/diffusionmodules/model.py", line 526, in forward
h = self.down[i_level].block[i_block](hs[-1], temb)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/content/gdrive/MyDrive/sd/stablediffusion/ldm/modules/diffusionmodules/model.py", line 131, in forward
h = self.norm1(h)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2528, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 0; 14.76 GiB total capacity; 6.37 GiB already allocated; 7.27 GiB free; 6.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF__