EC2 inf2 (Inferentia2 chip) doesn't seem to be supported

joliss commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

I was interested in seeing how Stable Diffusion would perform on Amazon's inf2 instances, which seem to run Amazon's AWS Inferentia chip.

Unfortunately, it doesn't seem to work at the moment.

Steps to reproduce the problem

Launch an EC2 instance with AMI Deep Learning AMI GPU PyTorch 2.0.0 (Amazon Linux 2) 20230406, instance type inf2.xlarge. Log into it via SSH (user ec2-user).

Install Python3.10:

sudo yum -y remove openssl-devel
sudo yum -y install openssl11 openssl11-devel bzip2-devel libffi-devel
curl -OL https://www.python.org/ftp/python/3.10.9/Python-3.10.9.tgz
tar xf Python-3.10.9.tgz
cd Python-3.10.9
sudo ./configure --enable-optimizations
sudo make altinstall

Run stable-diffusion-webui:

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
python_cmd=python3.10 ./webui.sh

What should have happened?

It should run, but instead I'm getting

AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

See full log below.

Commit where the problem happens

5ab7f213bec2f816f9c5644becb32eb72c8ffb89

What platforms do you use to access the UI ?

Linux

What browsers do you use to access the UI ?

No response

Command Line Arguments

python_cmd=python3.10 ./webui.sh

List of extensions

No

Console logs

``` $ python_cmd=python3.10 ./webui.sh ################################################################ Install script for stable-diffusion + Web UI Tested on Debian 11 (Bullseye) ################################################################ ################################################################ Running on ec2-user user ################################################################ ################################################################ Repo already cloned, using it as install directory ################################################################ ################################################################ Create and activate python venv ################################################################ ################################################################ Launching launch.py... ################################################################ ./webui.sh: line 179: ldconfig: command not found Cannot locate TCMalloc (improves CPU memory usage) Python 3.10.9 (main, May 10 2023, 12:57:05) [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)] Commit hash: 5ab7f213bec2f816f9c5644becb32eb72c8ffb89 Installing torch and torchvision Defaulting to user installation because normal site-packages is not writeable Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu118 Collecting torch==2.0.0 Downloading https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl (2267.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 GB ? eta 0:00:00 Collecting torchvision==0.15.1 Downloading https://download.pytorch.org/whl/cu118/torchvision-0.15.1%2Bcu118-cp310-cp310-linux_x86_64.whl (6.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.1/6.1 MB 13.5 MB/s eta 0:00:00 Collecting filelock Downloading filelock-3.12.0-py3-none-any.whl (10 kB) Collecting typing-extensions Downloading typing_extensions-4.5.0-py3-none-any.whl (27 kB) Collecting jinja2 Downloading https://download.pytorch.org/whl/Jinja2-3.1.2-py3-none-any.whl (133 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 2.5 MB/s eta 0:00:00 Collecting sympy Downloading https://download.pytorch.org/whl/sympy-1.11.1-py3-none-any.whl (6.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 29.0 MB/s eta 0:00:00 Collecting networkx Downloading networkx-3.1-py3-none-any.whl (2.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 10.0 MB/s eta 0:00:00 Collecting triton==2.0.0 Downloading https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.3/63.3 MB 4.3 MB/s eta 0:00:00 Collecting requests Downloading requests-2.30.0-py3-none-any.whl (62 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.5/62.5 kB 696.3 kB/s eta 0:00:00 Collecting numpy Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 12.1 MB/s eta 0:00:00 Collecting pillow!=8.3.*,>=5.3.0 Downloading Pillow-9.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 26.6 MB/s eta 0:00:00 Collecting cmake Downloading cmake-3.26.3-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (24.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.0/24.0 MB 8.6 MB/s eta 0:00:00 Collecting lit Downloading lit-16.0.3.tar.gz (138 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 138.0/138.0 kB 2.3 MB/s eta 0:00:00 Preparing metadata (setup.py) ... done Collecting MarkupSafe>=2.0 Downloading https://download.pytorch.org/whl/MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) Collecting urllib3<3,>=1.21.1 Downloading urllib3-2.0.2-py3-none-any.whl (123 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 123.2/123.2 kB 1.4 MB/s eta 0:00:00 Collecting certifi>=2017.4.17 Downloading certifi-2023.5.7-py3-none-any.whl (156 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 157.0/157.0 kB 1.4 MB/s eta 0:00:00 Collecting idna<4,>=2.5 Downloading https://download.pytorch.org/whl/idna-3.4-py3-none-any.whl (61 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.5/61.5 kB 869.5 kB/s eta 0:00:00 Collecting charset-normalizer<4,>=2 Downloading charset_normalizer-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (199 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 kB 1.7 MB/s eta 0:00:00 Collecting mpmath>=0.19 Downloading mpmath-1.3.0-py3-none-any.whl (536 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 7.4 MB/s eta 0:00:00 Installing collected packages: mpmath, lit, cmake, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, triton, torch, torchvision DEPRECATION: lit is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559 Running setup.py install for lit ... done Successfully installed MarkupSafe-2.1.2 certifi-2023.5.7 charset-normalizer-3.1.0 cmake-3.26.3 filelock-3.12.0 idna-3.4 jinja2-3.1.2 lit-16.0.3 mpmath-1.3.0 networkx-3.1 numpy-1.24.3 pillow-9.5.0 requests-2.30.0 sympy-1.11.1 torch-2.0.0+cu118 torchvision-0.15.1+cu118 triton-2.0.0 typing-extensions-4.5.0 urllib3-2.0.2 [notice] A new release of pip available: 22.3.1 -> 23.1.2 [notice] To update, run: pip3.10 install --upgrade pip Traceback (most recent call last): File "/home/ec2-user/src/stable-diffusion-webui/launch.py", line 352, in prepare_environment() File "/home/ec2-user/src/stable-diffusion-webui/launch.py", line 257, in prepare_environment run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'") File "/home/ec2-user/src/stable-diffusion-webui/launch.py", line 120, in run_python return run(f'"{python}" -c "{code}"', desc, errdesc) File "/home/ec2-user/src/stable-diffusion-webui/launch.py", line 96, in run raise RuntimeError(message) RuntimeError: Error running command. Command: "/usr/local/bin/python3.10" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'" Error code: 1 stdout: stderr: Traceback (most recent call last): File "", line 1, in AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check ```

Additional information

Running Stable Diffusion on inf2 is reportedly supported as of May 5, 2023.

I'm not sure how difficult it is to actually get it running, so this issue might be more of a feature request than a bug. Please feel free to edit the title accordingly.

akx commented 1 year ago

Did you try the suggested --skip-torch-cuda-test to have it skip the test?

joliss commented 1 year ago

With --skip-torch-cuda-test, it seems to simply run on the CPU. (It doesn't explicitly say so in the output, but I'm getting the same slow performance as when I pass --use-cpu all.)

``` $ python_cmd=python3.10 ./webui.sh --skip-torch-cuda-test --no-half ################################################################ Install script for stable-diffusion + Web UI Tested on Debian 11 (Bullseye) ################################################################ ################################################################ Running on ec2-user user ################################################################ ################################################################ Repo already cloned, using it as install directory ################################################################ ################################################################ Create and activate python venv ################################################################ ################################################################ Launching launch.py... ################################################################ ./webui.sh: line 179: ldconfig: command not found Cannot locate TCMalloc (improves CPU memory usage) Python 3.10.9 (main, May 11 2023, 18:28:36) [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)] Commit hash: 5ab7f213bec2f816f9c5644becb32eb72c8ffb89 Installing requirements Launching Web UI with arguments: --skip-torch-cuda-test --no-half /home/ec2-user/src/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") Warning: caught exception 'No CUDA GPUs are available', memory monitor disabled No module 'xformers'. Proceeding without it. Loading weights [6ce0161689] from /home/ec2-user/src/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors Creating model from config: /home/ec2-user/src/stable-diffusion-webui/configs/v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Applying cross attention optimization (InvokeAI). Textual inversion embeddings loaded(0): Model loaded in 1.2s (load weights from disk: 0.2s, create model: 0.4s, apply weights to model: 0.6s). Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Startup time: 4.9s (import torch: 0.9s, import gradio: 0.9s, import ldm: 0.5s, other imports: 0.4s, load scripts: 0.3s, load SD checkpoint: 1.3s, create ui: 0.4s). 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [04:06<00:00, 12.31s/it] Total progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [04:08<00:00, 12.42s/it] ^CInterrupted with signal 2 in ███████████████████████████████████████████████████████████████████████████| 20/20 [04:08<00:00, 12.29s/it] ```

kgonia commented 1 year ago

@joliss you're mixing things. It's possible to run any code on Inferentia2 if you adjust it first. A1111 will not work without changes.

sanguivore-easyco commented 1 year ago

I'm not familiar with the internals of A1111 yet, but if anyone more familiar ends up looking into this more deeply, here's a notebook AWS published on getting stable diffusion to use the inf2 architecture. https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_sd2_768_inference.ipynb

wbean commented 11 months ago

waiting for someone who familiar with A1111 and compatible aws-neuron, it's too hard for me.

Shellmode commented 10 months ago

waiting for someone who familiar with A1111 and compatible aws-neuron, it's too hard for me.

You need to compile components of SD model on Inf2 instance and run them on Inf2, please refer this blog: Maximize Stable Diffusion performance and lower inference costs with AWS Inferentia2

And you can find several notebooks to compile different version of SD models on GitHub

AUTOMATIC1111 / stable-diffusion-webui