[Bug]: M1 Mac textual inversion extremely slow

HamsterGerbil commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

First I have to say thank you AUTOMATIC1111 and devs for your incredible work. This is an incredible tool. Right now I'm trying to use the textual inversion and or hypernetwork training, but after creating the embedding and processing my images when I run the training process for both textual inversion and hyper networks each step takes approximately thirty seconds.

Any help would be greatly appreciated. I really love the webui and textual inversion is simply astonishing.

Steps to reproduce the problem

Go to training on the webui.
Create image imbedding and then go to preprocess images and process 512 x 512 images in a directory into a destination directory.
Run training with embedding and processed directory. Default settings (learning rate 0.005, batch size 1, log directory textual_inversion, prompt template file /Users/jesse/Documents/stable-diffusion-webui/textual_inversion_templates/style_filewords.txt, save images with embedding in PNG chunks.)
Click train embedding

What should have happened?

The imbedding should have trained at a rate of around 1.5-3.8s/it. This is the same speed that it usually runs for steps when creating an image in the webui from a prompt or when using a google colab for textual inversion training.

Commit where the problem happens

98947d1

What platforms do you use to access UI ?

MacOS

What browsers do you use to access the UI ?

Google Chrome, Apple Safari

Command Line Arguments

I modified Run_webui_mac.sh to fix an error where all embeddings would fail to load by disabling safe unpickle (I'm not running other people's embeddings so I felt it was somewhat safe). 

Here is the exact line:
python webui.py --disable-safe-unpickle --precision full --no-half --use-cpu Interrogate GFPGAN CodeFormer $@

Additional information, context and logs

Terminal window results: Loaded a total of 1 textual inversion embeddings. Embeddings: Don 100%|█████████████████████████████████████████████| 8/8 [00:01<00:00, 4.60it/s] Training at rate of 0.005 until step 100000 Preparing dataset... 100%|█████████████████████████████████████████████| 7/7 [00:01<00:00, 3.73it/s] [Epoch 0: 3/700]loss: 0.0034546: 0%| | 3/100000 [01:31<867:30:33, 31.23s/it]

marinohardin commented 1 year ago

I'm getting similar performance in Dreambooth now. I think it's falling back to CPU?

marinohardin commented 1 year ago

In my case, I was also getting ~30 s/it, and when I checked Activity Monitor, it was taking up ~30% CPU (on M1 Pro) and zero GPU. Can you check if that's happening for you too?

Do you get any errors or warnings in terminal about CUDA not found?

HamsterGerbil commented 1 year ago

I checked the GPU usage on activity monitor and I find that on average it uses about 50% GPU--which fluctuates between around 20% on the low end to around 70% at best. Meanwhile when I run standard image generation it uses about 80% GPU. Still new to coding so there might be something obvious I'm missing when I'm checking the monitor.

HamsterGerbil commented 1 year ago

I do get this warning about torch not being compiled with CUDA enabled.

Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled LatentDiffusion: Running in eps-prediction mode

Another warning I get is: WARNING: overwriting environment variables set in the machine overwriting variable {'PYTORCH_ENABLE_MPS_FALLBACK'}

julianko13 commented 1 year ago

Do you check the memory usage? On my system, it's only acceptable on setting to 400x400 or below.

yrik commented 1 year ago

Same issue, super slow

Torcelllo commented 1 year ago

Super slow, usually 3 to 5 hours - now 29 hours!!!

PC, 64Gb memory, nVidia GPU 8Mb VRAM, SSD

andupotorac commented 1 year ago

The issue is it's reverting back to the CPU when you enable no-half and the other options. So this won't work, otherwise you would train an embedding for a month. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon#poor-performance

ananoman commented 1 year ago

The issue is it's reverting back to the CPU when you enable no-half and the other options. So this won't work, otherwise you would train an embedding for a month. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon#poor-performance

So does this mean that there is no current solution for running a textual inversion on an M1 Mac, other than being willing to train an embedding for a month?!?

andupotorac commented 1 year ago

There is. I was able to train with A1111 but got issues with it, so I'm now using InvokeAI's Terminal GUI with these settings. And it's been going fine. Still plan to try A1111 again at some point for training too.

For LORAs I use Kohya.

andupotorac commented 1 year ago

With these settings for example I trained a cap embedding, then I put it on a goat.

AUTOMATIC1111 / stable-diffusion-webui