[Feature Request]: Adding support for Google Cloud TPU VM (inference example included)

aeroxy commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Google offers the Google Cloud TPU VM, an excellent platform for executing machine learning projects. Currently, the stable-diffusion-webui supports only GPU and not TPU. It may be beneficial to consider adding TPU support through XLA for enhanced performance and compatibility.

Proposed workflow

Execute the command: ./webui.sh.
The script will automatically detect the TPU VM environment.
Based on the identified environment, the script will install the necessary requirements.
Upon successful installation, the TPU-enabled mode will commence.

Additional information

I have attempted to create a version that utilizes XLA instead of GPU, which can be found here: https://github.com/aeroxy/stable-diffusion-webui/tree/tpu

To use this version, you must first manually enable the virtual environment by executing source ./venv/bin/activate. Next, install the torch_xla[tpuvm] package by running the command pip install torch_xla[tpuvm]. Once these steps are complete, you can run the ./webui.sh command again to start the server.

However, please note that an error may be encountered during the image generation process:

./webui.sh --enable-insecure-extension-access --skip-torch-cuda-test --share

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on aero user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0]
Version: v1.3.0-18-g30ddf52f
Commit hash: 30ddf52f374cfcbcf6003123952a79ba0e4e47a9
Installing requirements
Launching Web UI with arguments: --enable-insecure-extension-access --skip-torch-cuda-test --share
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled
[AddNet] Updating model hashes...
100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 1545.43it/s]
[AddNet] Updating model hashes...
100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 1497.97it/s]
Loading weights [fc2511737a] from /home/aero/stable-diffusion-webui/models/Stable-diffusion/chilloutmix_NiPrunedFp32Fix.safetensors
Running on local URL:  http://127.0.0.1:7860
Creating model from config: /home/aero/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Running on public URL: https://4bf680b8db45b0e0a4.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Startup time: 9.5s (import torch: 1.5s, import gradio: 1.2s, import ldm: 0.4s, other imports: 0.8s, load scripts: 0.6s, create ui: 0.4s, gradio launch: 4.7s).
Applying optimization: InvokeAI... done.
Textual inversion embeddings loaded(0):
Model loaded in 6.4s (load weights from disk: 0.9s, create model: 0.6s, apply weights to model: 0.4s, apply half(): 0.1s, load VAE: 2.7s, move model to device: 1.6s).
  0%|                                                                                                                                                                                                                                                                                                  | 0/20 [00:00<?, ?it/s]src/tcmalloc.cc:332] Attempt to free invalid pointer 0x7f7a4b1f4080
https://symbolize.stripped_domain/r/?trace=7f7b3281c00b,7f7b3281c08f,fffffffff520ffff,e900000002bffe88&map=
*** SIGABRT received by PID 983208 (TID 984262) on cpu 40 from PID 983208; stack trace: ***
PC: @     0x7f7b3281c00b  (unknown)  raise
    @     0x7f79de467a1a       1152  (unknown)
    @     0x7f7b3281c090  (unknown)  (unknown)
    @ 0xfffffffff5210000  (unknown)  (unknown)
    @ 0xe900000002bffe89  (unknown)  (unknown)
https://symbolize.stripped_domain/r/?trace=7f7b3281c00b,7f79de467a19,7f7b3281c08f,fffffffff520ffff,e900000002bffe88&map=ceee8fa20ddf9c34af43f587221e91de:7f79d153f000-7f79de67e840
E0625 03:35:22.310722  984262 coredump_hook.cc:414] RAW: Remote crash data gathering hook invoked.
E0625 03:35:22.310757  984262 coredump_hook.cc:453] RAW: Skipping coredump since rlimit was 0 at process start.
E0625 03:35:22.310766  984262 client.cc:278] RAW: Coroner client retries enabled (b/136286901), will retry for up to 30 sec.
E0625 03:35:22.310775  984262 coredump_hook.cc:512] RAW: Sending fingerprint to remote end.
E0625 03:35:22.310782  984262 coredump_socket.cc:120] RAW: Stat failed errno=2 on socket /var/google/services/logmanagerd/remote_coredump.socket
E0625 03:35:22.310791  984262 coredump_hook.cc:518] RAW: Cannot send fingerprint to Coroner: [NOT_FOUND] Missing crash reporting socket. Is the listener running?
E0625 03:35:22.310797  984262 coredump_hook.cc:580] RAW: Dumping core locally.
E0625 03:35:22.683050  984262 process_state.cc:784] RAW: Raising signal 6 with default behavior
Aborted (core dumped)

ClashSAN commented 1 year ago

@aeroxy Python 3.8 isn't supported, Only Python 3.10 is well-tested for normal use.
Do you still have the issue using 3.10? You will only need to create and use a venv with 3.10, so don't worry if your installed python isn't properly linked systemwide.

Very interesting for a quick hack, what kind of inference speeds are you getting?

aeroxy commented 1 year ago

@aeroxy Python 3.8 isn't supported, Only Python 3.10 is well-tested for normal use. Do you still have the issue using 3.10? You will only need to create and use a venv with 3.10, so don't worry if your installed python isn't properly linked systemwide.

Very interesting for a quick hack, what kind of inference speeds are you getting?

TPU lib and TPU pytorch only supports Python 3.8.

I was able to generate 32 512x512 images in 100 seconds.

AnxiJose commented 1 year ago

I actually did a test using python3.10 and doing all the changes the OP did. I managed to get it working

Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on azerudream user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
python venv already activate or run without venv: /home/azerudream/stable-diffusion-webui
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.10.13 (main, Aug 25 2023, 13:20:03) [GCC 9.4.0]
Version: v1.5.2
Commit hash: c9c8485bc1e8720aba70f029d25cba1c4abf2b5c
Launching Web UI with arguments: --no-half --skip-torch-cuda-test --share --opt-sdp-attention --upcast-sampling
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled
Loading weights [6ce0161689] from /home/azerudream/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Running on local URL:  http://127.0.0.1:7860
Creating model from config: /home/azerudream/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying attention optimization: sdp... done.
Model loaded in 4.0s (load weights from disk: 1.4s, create model: 1.3s, apply weights to model: 0.3s, move model to device: 0.7s, calculate empty prompt: 0.2s).
Running on public URL: https://67da8d1114e1890248.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Startup time: 10.5s (launcher: 0.3s, import torch: 2.9s, import gradio: 0.8s, setup paths: 0.8s, other imports: 0.5s, load scripts: 0.5s, create ui: 0.8s, gradio launch: 3.8s).
 30%|████████████████████████▎                                                        | 6/20 [08:39<23:22, 100.19s/it]
Total progress:  30%|███████████████████▊                                              | 6/20 [07:57<22:16, 95.47s/it

Maybe a little slow cause of some data loading, hoping i could get some help to speed it up if I can.

I used https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/11.8/torch-2.0-cp310-cp310-linux_x86_64.whl for torch https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/11.8/torch_xla-2.0-cp310-cp310-linux_x86_64.whl for torch XLA

maxpain commented 1 year ago

@AnxiJose is it faster than GPU?

AnxiJose commented 1 year ago

As seen in the logs, i could only achieve some 100 seconds per each iteration. I'm pretty sure most modern GPUs can archieve at least some iterations per second. So no, it's real slow in several orders of magnitude.

AUTOMATIC1111 / stable-diffusion-webui