facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Apache License 2.0
4.53k stars 361 forks source link

Stable diffusion examples generates bad images #131

Open tchaton opened 1 year ago

tchaton commented 1 year ago

Hey there,

I am trying run AITemplate Stable diffusion examples on T4 GPU.

I have tried with the same package version as described in the README using master and this branch: https://github.com/facebookincubator/AITemplate/pull/74/commits/d62f0773eac88623846c192858e6028288054043

I am just getting crappy images.

Would it be possible for you to benchmark and validate the model work on T4 GPU ?

aiobotocore==2.4.1
aiohttp==3.8.3
aioitertools==0.11.0
aiosignal==1.3.1
aitemplate @ file:///content/AITemplate/python/dist/aitemplate-0.1.dev1-py3-none-any.whl
anyio==3.6.2
arrow==1.2.3
async-timeout==4.0.2
attrs==22.1.0
beautifulsoup4==4.11.1
black==22.12.0
bleach==5.0.1
blessed==1.19.1
botocore==1.27.59
certifi==2019.11.28
cffi==1.15.1
chardet==3.0.4
charset-normalizer==2.1.1
click==8.1.3
commonmark==0.9.1
croniter==1.3.8
cryptography==38.0.4
dbus-python==1.2.16
deepdiff==6.2.2
diffusers==0.3.0
distlib==0.3.6
dnspython==2.2.1
docker==6.0.1
docutils==0.19
email-validator==1.3.0
exceptiongroup==1.0.4
fastapi==0.88.0
filelock==3.8.2
frozenlist==1.3.3
fsspec==2022.11.0
h11==0.14.0
httpcore==0.16.2
httptools==0.5.0
httpx==0.23.1
huggingface-hub==0.11.1
idna==2.8
importlib-metadata==5.1.0
iniconfig==1.1.1
inquirer==3.0.0
isort==5.11.2
itsdangerous==2.1.2
jaraco.classes==3.2.3
jeepney==0.8.0
Jinja2==3.1.2
jmespath==1.0.1
keyring==23.11.0
lightning @ file:///content/lightning-1.9.0.dev0.tar.gz
lightning-cloud @ file:///content/lightning_cloud-0.5.13.tar.gz
lightning-launcher @ file:///content/lightning_launcher-0.0.42.tar.gz
lightning-utilities==0.4.2
MarkupSafe==2.1.1
more-itertools==9.0.0
multidict==6.0.3
mypy-extensions==0.4.3
numpy==1.24.0
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
ordered-set==4.1.0
orjson==3.8.3
packaging==22.0
pathspec==0.10.3
Pillow==9.3.0
pkginfo==1.9.2
platformdirs==2.6.0
pluggy==1.0.0
protobuf==3.20.1
psutil==5.9.4
pycparser==2.21
pydantic==1.10.2
Pygments==2.13.0
PyGObject==3.36.0
PyJWT==2.6.0
pytest==7.2.0
python-apt==2.0.0+ubuntu0.20.4.8
python-dateutil==2.8.2
python-dotenv==0.21.0
python-editor==1.0.4
python-multipart==0.0.5
PyYAML==6.0
readchar==4.0.3
readme-renderer==37.3
redis==4.4.0
regex==2022.10.31
requests==2.28.1
requests-toolbelt==0.10.1
requests-unixsocket==0.2.0
rfc3986==1.5.0
rich==12.6.0
s3fs==2022.11.0
SecretStorage==3.3.3
six==1.14.0
sniffio==1.3.0
soupsieve==2.3.2.post1
starlette==0.22.0
starsessions==1.3.0
tabulate==0.9.0
tensorboardX==2.5.1
tokenizers==0.12.1
tomli==2.0.1
torch==1.12.0+cu116
torchmetrics==0.11.0
tqdm==4.64.1
traitlets==5.7.1
transformers==4.21.2
twine==4.0.2
typing_extensions==4.4.0
ujson==5.6.0
urllib3==1.26.13
uvicorn==0.20.0
uvloop==0.17.0
virtualenv==20.17.1
watchfiles==0.18.1
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.4.2
websockets==10.4
wrapt==1.14.1
yarl==1.8.2
zipp==3.11.0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   28C    P8    14W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Best, T.C

ipiszy commented 1 year ago

Unfortunately not all kernels run well on SM75 GPUs. Check this readme: https://fburl.com/pimcs20r.

tchaton commented 1 year ago

Hey @ipiszy. Thanks for answering. This page seems protected under internal Meta login.

Screenshot 2022-12-23 at 16 26 09

I have several questions for you ?

I currently have access to T4, A10 and V100.

terrychenism commented 1 year ago

We did fully test on Ampere GPUs ( A100), for most of kernels Turing GPUs(T4) should work well

tchaton commented 1 year ago

@terrychenism. Great to know. Do you know which kernels could have issues on T4 and how to go about debugging them ?

Any chance you could test this out on T4 more deeply too ? T4 are quite cheap GPUs for users to run their inference upon. A100 are quite high end and less accessible.

chavinlo commented 1 year ago

Hello, I am having a similar issue. I am using an A100 as reported by nvidia-smi:

Fri Dec 23 21:47:17 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:00:06.0 Off |                    0 |
| N/A   26C    P0    49W / 400W |      0MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

And tried compiling https://huggingface.co/Red54/waifu-diffusion-v1-3-5-tainted with commit https://github.com/facebookincubator/AITemplate/commit/3625c3056041963926ba455347d9091fb891b872 due to the latest version not working well with pre-2.X SD models #133

Using the following prompt: Anime girl, Beautiful, masterpiece, Extremely Delicate Unity CG 8K-Wallpaper, Extremely Delicate Pixiv 8K-Illustration, Best Quality, Hyper Detailed, Intricate Details, Limited Palette, Photographic Incandescents, [Depth Of Field, Bokeh Effect], Focus On Character, Critical Angle, High_Quality ++Pretty Girl

Generates this image: image

Which certainly has nothing to do with the prompt. The only modifications I made is changing "Runwayml/Stable-diffusion-v1-5" to "Red54/waifu-diffusion-v1-3-5-tainted"

tchaton commented 1 year ago

@terrychenism Any updates or future plans to validate T4 works well ?

terrychenism commented 1 year ago

@terrychenism Any updates or future plans to validate T4 works well ?

cc @ipiszy

terrychenism commented 1 year ago

@tchaton I don't have access to T4 gpu. Could you please run the model and localize the ops which are not supported on T4?

tchaton commented 1 year ago

@terrychenism Unfortunately, the model inference works as everything compiled properly. But the generated images are random noise. So I can't identify which operation aren't properly compiled.

@ipiszy Can you give me access to fburl.com/pimcs20r ?

sgrigory commented 1 year ago

@ipiszy Can you give me access to fburl.com/pimcs20r ?

@tchaton This link resolves to https://github.com/facebookincubator/AITemplate#installation , if that helps