Open HamsterGerbil opened 1 year ago
I'm getting similar performance in Dreambooth now. I think it's falling back to CPU?
In my case, I was also getting ~30 s/it, and when I checked Activity Monitor, it was taking up ~30% CPU (on M1 Pro) and zero GPU. Can you check if that's happening for you too?
Do you get any errors or warnings in terminal about CUDA not found?
I checked the GPU usage on activity monitor and I find that on average it uses about 50% GPU--which fluctuates between around 20% on the low end to around 70% at best. Meanwhile when I run standard image generation it uses about 80% GPU. Still new to coding so there might be something obvious I'm missing when I'm checking the monitor.
I do get this warning about torch not being compiled with CUDA enabled.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled LatentDiffusion: Running in eps-prediction mode
Another warning I get is: WARNING: overwriting environment variables set in the machine overwriting variable {'PYTORCH_ENABLE_MPS_FALLBACK'}
Do you check the memory usage? On my system, it's only acceptable on setting to 400x400 or below.
Same issue, super slow
Super slow, usually 3 to 5 hours - now 29 hours!!!
PC, 64Gb memory, nVidia GPU 8Mb VRAM, SSD
The issue is it's reverting back to the CPU when you enable no-half and the other options. So this won't work, otherwise you would train an embedding for a month. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon#poor-performance
The issue is it's reverting back to the CPU when you enable no-half and the other options. So this won't work, otherwise you would train an embedding for a month. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon#poor-performance
So does this mean that there is no current solution for running a textual inversion on an M1 Mac, other than being willing to train an embedding for a month?!?
There is. I was able to train with A1111 but got issues with it, so I'm now using InvokeAI's Terminal GUI with these settings. And it's been going fine. Still plan to try A1111 again at some point for training too.
For LORAs I use Kohya.
With these settings for example I trained a cap embedding, then I put it on a goat.
Is there an existing issue for this?
What happened?
First I have to say thank you AUTOMATIC1111 and devs for your incredible work. This is an incredible tool. Right now I'm trying to use the textual inversion and or hypernetwork training, but after creating the embedding and processing my images when I run the training process for both textual inversion and hyper networks each step takes approximately thirty seconds.
Any help would be greatly appreciated. I really love the webui and textual inversion is simply astonishing.
Steps to reproduce the problem
What should have happened?
The imbedding should have trained at a rate of around 1.5-3.8s/it. This is the same speed that it usually runs for steps when creating an image in the webui from a prompt or when using a google colab for textual inversion training.
Commit where the problem happens
98947d1
What platforms do you use to access UI ?
MacOS
What browsers do you use to access the UI ?
Google Chrome, Apple Safari
Command Line Arguments
Additional information, context and logs
Terminal window results: Loaded a total of 1 textual inversion embeddings. Embeddings: Don 100%|█████████████████████████████████████████████| 8/8 [00:01<00:00, 4.60it/s] Training at rate of 0.005 until step 100000 Preparing dataset... 100%|█████████████████████████████████████████████| 7/7 [00:01<00:00, 3.73it/s] [Epoch 0: 3/700]loss: 0.0034546: 0%| | 3/100000 [01:31<867:30:33, 31.23s/it]