M1 MacOS - Training Textual Inversion fails

StableInquest commented 2 years ago

When starting training under the textual inversion tab training fails. It loads the pre-processed images but fails with the following error:

Traceback (most recent call last):
  File "/Users/user/stable-diffusion-webui/modules/ui.py", line 187, in f
    res = list(func(*args, **kwargs))
  File "/Users/user/stable-diffusion-webui/webui.py", line 64, in f
    res = func(*args, **kwargs)
  File "/Users/user/stable-diffusion-webui/modules/textual_inversion/ui.py", line 31, in train_embedding
    embedding, filename = modules.textual_inversion.textual_inversion.train_embedding(*args)
  File "/Users/user/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 268, in train_embedding
    loss = shared.sd_model(x.unsqueeze(0), c)[0]
  File "/Users/user/miniconda/envs/web-ui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/user/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 879, in forward
    return self.p_losses(x, c, t, *args, **kwargs)
  File "/Users/user/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1030, in p_losses
    logvar_t = self.logvar[t].to(self.device)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

It should instead continue the training through the number of steps that have been specified by the user.

ChronoStriker1 commented 2 years ago

Im not sure this is a M1 MacOS issue, I also get this in linux with my 3090ti

Traceback (most recent call last):
  File "/mnt/pictures/stable-diffusion-webui/modules/ui.py", line 215, in f
    res = list(func(*args, **kwargs))
  File "/mnt/pictures/stable-diffusion-webui/webui.py", line 64, in f
    res = func(*args, **kwargs)
  File "/mnt/pictures/stable-diffusion-webui/modules/textual_inversion/ui.py", line 31, in train_embedding
    embedding, filename = modules.textual_inversion.textual_inversion.train_embedding(*args)
  File "/mnt/pictures/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 268, in train_embedding
    loss = shared.sd_model(x, c)[0]
  File "/mnt/pictures/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1357, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/pictures/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 879, in forward
    return self.p_losses(x, c, t, *args, **kwargs)
  File "/mnt/pictures/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1030, in p_losses
    logvar_t = self.logvar[t].to(self.device)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

StableInquest commented 2 years ago

Any luck resolving this yet?

cobryan05 commented 2 years ago

I started running in to this (Win10) after changing pytorch versions while trying to troubleshoot a 'cuda not available' issue.

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

fixed it for me.

StableInquest commented 2 years ago

Thanks for the tip. I attempted it but I realized I dont think those versions are supported on ARM (M1 Mac) architecture.

remixer-dec commented 2 years ago

Editing repositories/stable-diffusion/ldm/models/diffusion/ddpm.py and adding new line above line 1030 with code t = t.to('cpu') fixes this issue, however it takes 32+GB of RAM while training which causes it (with my hardware) to use SWAP, which 1) is extremely slow 2) rapidly destroys the SSD, 3) does not work, it shows loss as nan and then when you try to use it as an embedding, it throws "cannot convert float NaN to integer"

StableInquest commented 2 years ago

Anyone get this working?

remixer-dec commented 2 years ago

After updating both this repo and pytorch, new errors appeared.

File "modules/textual_inversion/textual_inversion.py", line 306, in train_embedding
    loss.backward()
  File "~/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "~/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [MPSFloatType [1, 77, 768]], which is output 0 of MulBackward0, is at version 2; expected version 1 instead.

remixer-dec commented 2 years ago

So it was a pytorch issue. Downgrading to pytorch 1.12.1 helped to solve both the bug above and the nan issue. But the memory consumption of this repo's implementation is still huge. With --medvram flag it crashes at the same line as above at loss.backward() with RuntimeError: Placeholder storage has not been allocated on MPS device - this can be probably fixed with assigning the correct device, but I'm not sure where exactly.

I tried to create an embedding with other repos to use it later with this UI. Training data: 8 images. 1) Optimized for m1 version of stable-diffusion -- 7.86s/it, 2 epochs (~1000 steps), ~25GB max RAM consumption

Results:

training sample - generated sample

it worked! To get better results I guess I need to train it for more steps. The results are stored in the logs directory. That repo requires python 3.10, but it is not hard to fix compatibility for earlier versions. Also this is an important fix that helps to reduce random nan issues by a lot, both repos have it, may be this web ui needs it too after other issues will be solved.

2) Alternative ui with less features, with an active mac/m1 community -- 4.35s/it, 1.5 epochs, 1000 steps ~20GB max RAM consumption

Results:

training sample - generated sample

Training
An example command from here can be used for both repos, just replace the base parameter with v1-m1-finetune.yaml for InvokeAI

StableInquest commented 2 years ago

I got it going this way:

move to these newer versions: pip3 install --pre torch==1.14.0.dev20221101 torchaudio==0.14.0.dev20221101 torchtext==0.14.0.dev20221101 torchvision==0.15.0.dev20221101 -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html --no-deps #for normal use

and in : miniconda/envs/web-ui/lib/python3.10/site-packages/torch/_tensor.py edit this:

        if self.device.type in ['xla', 'ort', 'mps', 'hpu']:
            return (torch._utils._rebuild_device_tensor_from_numpy, (self.cpu().numpy(),
                                                                     self.dtype,
                                                                     str(self.device),
                                                                     self.requires_grad))

to this:

        if self.device.type in ['xla', 'ort', 'mps', 'hpu']:
            return (torch._utils._rebuild_device_tensor_from_numpy, (self.cpu().detach().numpy(),
                                                                     self.dtype,
                                                                     str(self.device),
                                                                     self.requires_grad),)

StableInquest commented 2 years ago

With this configuration above during training memory usage is sitting about 35-45GB using mostly defaults, 512x512 etc.

MagicTide commented 2 years ago

So it was a pytorch issue. Downgrading to pytorch 1.12.1 helped to solve both the bug above and the nan issue. But the memory consumption of this repo's implementation is still huge. With --medvram flag it crashes at the same line as above at loss.backward() with RuntimeError: Placeholder storage has not been allocated on MPS device - this can be probably fixed with assigning the correct device, but I'm not sure where exactly.

I tried to create an embedding with other repos to use it later with this UI. Training data: 8 images. 1) Optimized for m1 version of stable-diffusion -- 7.86s/it, 2 epochs (~1000 steps), ~25GB max RAM consumption

Results: 2) Alternative ui with less features, with an active mac/m1 community -- 4.35s/it, 1.5 epochs, 1000 steps ~20GB max RAM consumption

Results: Training An example command from here can be used for both repos, just replace the base parameter with v1-m1-finetune.yaml for InvokeAI

Hi friend, can you elaborate on how to do this step? I have the same problem after adding t = t.to('cpu'), thanks!

remixer-dec commented 2 years ago

@MagicTide which problem exactly do you have? I was not able to run it with --medvram and I don't have enough RAM to use it without this flag.

MagicTide commented 2 years ago

@MagicTide which problem exactly do you have? I was not able to run it with --medvram and I don't have enough RAM to use it without this flag.

Just like this, I can't train it. And I have added t = t.to('cpu')

MagicTide commented 2 years ago

@MagicTide which problem exactly do you have? I was not able to run it with --medvram and I don't have enough RAM to use it without this flag.

And just like this(macOS m1 Max 32)

Applying cross attention optimization (InvokeAI).
Error completing request
Arguments: ('testjack', '0.005', 1, '/Users/XXX/Documents/traintest_out', 'textual_inversion', 512, 512, 20000, 100, 500, '/Users/XXX/Documents/stable-diffusion-webui/textual_inversion_templates/subject_filewords.txt', True, False, 'testjack', '', 20, 0, 7, -1.0, 512, 512) {}
Traceback (most recent call last):
  File "/Users/XXX/Documents/stable-diffusion-webui/modules/ui.py", line 185, in f
    res = list(func(*args, **kwargs))
  File "/Users/XXX/Documents/stable-diffusion-webui/webui.py", line 56, in f
    res = func(*args, **kwargs)
  File "/Users/XXX/Documents/stable-diffusion-webui/modules/textual_inversion/ui.py", line 33, in train_embedding
    embedding, filename = modules.textual_inversion.textual_inversion.train_embedding(*args)
  File "/Users/XXX/Documents/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 303, in train_embedding
    loss = shared.sd_model(x, c)[0]
  File "/Users/XXX/Documents/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/XXX/Documents/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 879, in forward
    return self.p_losses(x, c, t, *args, **kwargs)
  File "/Users/XXX/Documents/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1030, in p_losses
    logvar_t = self.logvar[t].to(self.device)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

MagicTide commented 2 years ago

After updating both this repo and pytorch, new errors appeared.

File "modules/textual_inversion/textual_inversion.py", line 306, in train_embedding
    loss.backward()
  File "~/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "~/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [MPSFloatType [1, 77, 768]], which is output 0 of MulBackward0, is at version 2; expected version 1 instead.

And if I comment out the git pull and run it again, I get the same error as your this one error……

remixer-dec commented 2 years ago

@MagicTide here is another fix for the cpu problem, the last bug can be avoided by downgrading, as I said before. But if you have a 32gb device, do not waste your time and just try to train the inversion with InvokeAI (unless you have any idea on how to fix the code in this repo to use less RAM)

MagicTide commented 2 years ago

Thx 😄

ClashSAN commented 1 year ago

this is an old issue which is fixed and no longer seems relevant, if this issue is related to future issues, please refer to this previous one.

AUTOMATIC1111 / stable-diffusion-webui

M1 MacOS - Training Textual Inversion fails #2685