cantrell / stable-diffusion-api-server

A local API server on top of Stable Diffusion.
Apache License 2.0
356 stars 61 forks source link

Very slow speed during image generation with 512x512 image #7

Open Gitterman69 opened 1 year ago

Gitterman69 commented 1 year ago

am i missing some optimisations or is the image generation supposed to be that slow? automatic1111 local runs with 1-2 s/it for me locally on this m1 14inch mbpro... hope you can help me :)

* Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:1337
 * Running on http://192.168.178.23:1337
Press CTRL+C to quit
/Users/bamboozle/opt/miniconda3/envs/sd-api-server/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py:249: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1665904452771/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  text_embeddings = text_embeddings.repeat_interleave(num_images_per_prompt, dim=0)
  8%|████████                                                                                                | 1/13 [00:23<04:46, 23.91s/it]
Gitterman69 commented 1 year ago

Tried to speed up the generation by changing the installation process a bit:

didnt work - still very slow it/s (s. below)

# Remove torch and all related packages
pip uninstall torch torchvision torchaudio -y

# Normally, we would install the latest nightly build of PyTorch here,
# But there's currently a performance regression in the latest nightly releases.
# Therefore, we're going to use this old version which doesn't have it.
# TODO: go back once fixed on PyTorch side
pip install --pre torch==1.13.0.dev20220922 torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html --no-deps

# Activate the MPS_FALLBACK conda environment variable
conda env config vars set PYTORCH_ENABLE_MPS_FALLBACK=1

(sd-api-server) ➜  stable-diffusion-api-server-main python server.py
Fetching 16 files: 100%|█████████████████████| 16/16 [00:00<00:00, 21454.24it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
 * Serving Flask app 'server'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:1337
 * Running on http://192.168.178.23:1337
Press CTRL+C to quit
/Users/bamboozle/opt/miniconda3/envs/sd-api-server/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py:249: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  text_embeddings = text_embeddings.repeat_interleave(num_images_per_prompt, dim=0)
 50%|██████████████████████                      | 7/14 [02:39<02:45, 23.67s/it]

any help would be highly appreciated!

jannichorst commented 1 year ago

Hi, ran into the same problem. Have a maxed out M1 Max 16" and one iteration seems to take around 35s/it... didn't run in any issues while setting things up. But going off the activity monitor, it seems that the GPU isn't utilised at all. Any idea how to force PyTorch to use the GPU?

UPDATE: Updated to latest MacOS version (13.0.1 from 12.1) resolved the issue. Now It's around 2s/it. For anybody having the same issue try executing the following in the environment of the server:

import torch import math

this ensures that the current MacOS version is at least 12.3+

print(torch.backends.mps.is_available())

this ensures that the current current PyTorch installation was built with MPS activated.

print(torch.backends.mps.is_built())

both of them need should be True... if not, there is an issue with your PyTorch accessing the GPU. For more information read: https://towardsdatascience.com/installing-pytorch-on-apple-m1-chip-with-gpu-acceleration-3351dc44d67c