Closed StableInquest closed 1 year ago
Im not sure this is a M1 MacOS issue, I also get this in linux with my 3090ti
Traceback (most recent call last): File "/mnt/pictures/stable-diffusion-webui/modules/ui.py", line 215, in f res = list(func(*args, **kwargs)) File "/mnt/pictures/stable-diffusion-webui/webui.py", line 64, in f res = func(*args, **kwargs) File "/mnt/pictures/stable-diffusion-webui/modules/textual_inversion/ui.py", line 31, in train_embedding embedding, filename = modules.textual_inversion.textual_inversion.train_embedding(*args) File "/mnt/pictures/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 268, in train_embedding loss = shared.sd_model(x, c)[0] File "/mnt/pictures/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1357, in _call_impl return forward_call(*input, **kwargs) File "/mnt/pictures/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 879, in forward return self.p_losses(x, c, t, *args, **kwargs) File "/mnt/pictures/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1030, in p_losses logvar_t = self.logvar[t].to(self.device) RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
Any luck resolving this yet?
I started running in to this (Win10) after changing pytorch versions while trying to troubleshoot a 'cuda not available' issue.
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
fixed it for me.
Thanks for the tip. I attempted it but I realized I dont think those versions are supported on ARM (M1 Mac) architecture.
Editing repositories/stable-diffusion/ldm/models/diffusion/ddpm.py
and adding new line above line 1030 with code t = t.to('cpu')
fixes this issue, however it takes 32+GB of RAM while training which causes it (with my hardware) to use SWAP, which 1) is extremely slow 2) rapidly destroys the SSD, 3) does not work, it shows loss as nan and then when you try to use it as an embedding, it throws "cannot convert float NaN to integer"
Anyone get this working?
After updating both this repo and pytorch, new errors appeared.
File "modules/textual_inversion/textual_inversion.py", line 306, in train_embedding
loss.backward()
File "~/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "~/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [MPSFloatType [1, 77, 768]], which is output 0 of MulBackward0, is at version 2; expected version 1 instead.
So it was a pytorch issue. Downgrading to pytorch 1.12.1 helped to solve both the bug above and the nan issue. But the memory consumption of this repo's implementation is still huge. With --medvram flag it crashes at the same line as above at loss.backward() with RuntimeError: Placeholder storage has not been allocated on MPS device
- this can be probably fixed with assigning the correct device, but I'm not sure where exactly.
I tried to create an embedding with other repos to use it later with this UI. Training data: 8 images. 1) Optimized for m1 version of stable-diffusion -- 7.86s/it, 2 epochs (~1000 steps), ~25GB max RAM consumption
2) Alternative ui with less features, with an active mac/m1 community -- 4.35s/it, 1.5 epochs, 1000 steps ~20GB max RAM consumption
Training
An example command from here can be used for both repos, just replace the base parameter with v1-m1-finetune.yaml
for InvokeAI
I got it going this way:
move to these newer versions:
pip3 install --pre torch==1.14.0.dev20221101 torchaudio==0.14.0.dev20221101 torchtext==0.14.0.dev20221101 torchvision==0.15.0.dev20221101 -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html --no-deps #for normal use
and in : miniconda/envs/web-ui/lib/python3.10/site-packages/torch/_tensor.py edit this:
if self.device.type in ['xla', 'ort', 'mps', 'hpu']:
return (torch._utils._rebuild_device_tensor_from_numpy, (self.cpu().numpy(),
self.dtype,
str(self.device),
self.requires_grad))
to this:
if self.device.type in ['xla', 'ort', 'mps', 'hpu']:
return (torch._utils._rebuild_device_tensor_from_numpy, (self.cpu().detach().numpy(),
self.dtype,
str(self.device),
self.requires_grad),)
With this configuration above during training memory usage is sitting about 35-45GB using mostly defaults, 512x512 etc.
So it was a pytorch issue. Downgrading to pytorch 1.12.1 helped to solve both the bug above and the nan issue. But the memory consumption of this repo's implementation is still huge. With --medvram flag it crashes at the same line as above at loss.backward() with
RuntimeError: Placeholder storage has not been allocated on MPS device
- this can be probably fixed with assigning the correct device, but I'm not sure where exactly.I tried to create an embedding with other repos to use it later with this UI. Training data: 8 images. 1) Optimized for m1 version of stable-diffusion -- 7.86s/it, 2 epochs (~1000 steps), ~25GB max RAM consumption
Results: 2) Alternative ui with less features, with an active mac/m1 community -- 4.35s/it, 1.5 epochs, 1000 steps ~20GB max RAM consumption
Results: Training An example command from here can be used for both repos, just replace the base parameter with
v1-m1-finetune.yaml
for InvokeAI
Hi friend, can you elaborate on how to do this step? I have the same problem after adding t = t.to('cpu'), thanks!
@MagicTide which problem exactly do you have? I was not able to run it with --medvram and I don't have enough RAM to use it without this flag.
@MagicTide which problem exactly do you have? I was not able to run it with --medvram and I don't have enough RAM to use it without this flag.
Just like this, I can't train it. And I have added t = t.to('cpu')
@MagicTide which problem exactly do you have? I was not able to run it with --medvram and I don't have enough RAM to use it without this flag.
And just like this(macOS m1 Max 32)
Applying cross attention optimization (InvokeAI).
Error completing request
Arguments: ('testjack', '0.005', 1, '/Users/XXX/Documents/traintest_out', 'textual_inversion', 512, 512, 20000, 100, 500, '/Users/XXX/Documents/stable-diffusion-webui/textual_inversion_templates/subject_filewords.txt', True, False, 'testjack', '', 20, 0, 7, -1.0, 512, 512) {}
Traceback (most recent call last):
File "/Users/XXX/Documents/stable-diffusion-webui/modules/ui.py", line 185, in f
res = list(func(*args, **kwargs))
File "/Users/XXX/Documents/stable-diffusion-webui/webui.py", line 56, in f
res = func(*args, **kwargs)
File "/Users/XXX/Documents/stable-diffusion-webui/modules/textual_inversion/ui.py", line 33, in train_embedding
embedding, filename = modules.textual_inversion.textual_inversion.train_embedding(*args)
File "/Users/XXX/Documents/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 303, in train_embedding
loss = shared.sd_model(x, c)[0]
File "/Users/XXX/Documents/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/XXX/Documents/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 879, in forward
return self.p_losses(x, c, t, *args, **kwargs)
File "/Users/XXX/Documents/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1030, in p_losses
logvar_t = self.logvar[t].to(self.device)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
After updating both this repo and pytorch, new errors appeared.
File "modules/textual_inversion/textual_inversion.py", line 306, in train_embedding loss.backward() File "~/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward torch.autograd.backward( File "~/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [MPSFloatType [1, 77, 768]], which is output 0 of MulBackward0, is at version 2; expected version 1 instead.
And if I comment out the git pull and run it again, I get the same error as your this one error……
@MagicTide here is another fix for the cpu problem, the last bug can be avoided by downgrading, as I said before. But if you have a 32gb device, do not waste your time and just try to train the inversion with InvokeAI (unless you have any idea on how to fix the code in this repo to use less RAM)
Thx 😄
this is an old issue which is fixed and no longer seems relevant, if this issue is related to future issues, please refer to this previous one.
When starting training under the textual inversion tab training fails. It loads the pre-processed images but fails with the following error:
It should instead continue the training through the number of steps that have been specified by the user.