Open MrCheeze opened 1 year ago
@MrCheeze There is a PR with the required MSVC fixes pending.
x64 Native Tools Command Prompt for VS 2022
set AIT_USE_CMAKE_COMPILATION=1
There are also some other pending PRs:
In addition my local version also has a patch to mostly disable logging, disable profiling codegen to reduce output size, benchmark is enabled as it is useful, AIT_USE_CMAKE_COMPILATION=1
is default.
The reason for the logging patch is that the full path is displayed, this looks messy on the logging, and could inadvertently include the user's personal information
I do not have a wheel for my current version, as I've made some of those changes directly to the installed version, however I've attached that copy, it can replace an existing installation aitemplate.zip
Also attached are precompiled modules, they are for sm80, batch size 1 (technically 1 to 2 for classifier free guidance), I've opted for the .dll extension on the above PRs, the module selection code in this repo uses .so
, when @FizzleDorf uploads these to the huggingface repo they can be renamed to .so
or the code can be adjusted.
Note the compressed extension is .zip
but they are actually .xz
and 7-zip will recognise that.
unet_xl_1024.dll.zip
unet_xl_2048.dll.zip
unet_xl_1536.dll.zip
expected VRAM usage according to AITemplate's memory planning
Fresh VAE modules, fp16, compatible with all SD VAE's that work in fp16, for XL use madebyollin/sdxl-vae-fp16-fix
Usage
encode_256 69,601,024
decode_256 88,080,384
encode_512 277,612,288
decode_512 343,932,928
encode_1024 1,110,445,568
decode_1024 1,375,731,712
encode_2048 4,435,480,064
decode_2048 5,502,926,848
+ size of weights, approx 94mb for decoder, 65mb for encoder
@MrCheeze There is a PR with the required MSVC fixes pending.
- VS2022, community/professional recommended, there is an issue with a build tools only install
- open
x64 Native Tools Command Prompt for VS 2022
- run
set AIT_USE_CMAKE_COMPILATION=1
- run compilation scripts
There are also some other pending PRs:
- This PR is also required for XL.
- This PR adds support for 8px increment.
- This PR mirrors the UNet changes for XL already present here, and it adds support for CLIP compatible with XL (pooled output/text embeds, and output hidden states/clip skip), although AIT CLIP is not used by this plugin
In addition my local version also has a patch to mostly disable logging, disable profiling codegen to reduce output size, benchmark is enabled as it is useful,
AIT_USE_CMAKE_COMPILATION=1
is default.The reason for the logging patch is that the full path is displayed, this looks messy on the logging, and could inadvertently include the user's personal information
I do not have a wheel for my current version, as I've made some of those changes directly to the installed version, however I've attached that copy, it can replace an existing installation aitemplate.zip
Also attached are precompiled modules, they are for sm80, batch size 1 (technically 1 to 2 for classifier free guidance), I've opted for the .dll extension on the above PRs, the module selection code in this repo uses
.so
, when @FizzleDorf uploads these to the huggingface repo they can be renamed to.so
or the code can be adjusted.Note the compressed extension is
.zip
but they are actually.xz
and 7-zip will recognise that. unet_xl_1024.dll.zip unet_xl_2048.dll.zip unet_xl_1536.dll.zipexpected VRAM usage according to AITemplate's memory planning
- 1024 721,141,760
- 1536 991,784,960
- 2048 1,360,564,480 +size of weights in fp16
thanks! where do I put these? I tried to place them in ComfyUI\custom_nodes\AIT\AITemplate\modules
and renaming them to .SO, it doesn't see them.
I'm on linux ubuntu 22.04 and also would looooove to get AiTemplate support on SDXL. I downloaded the modules from here: https://huggingface.co/city96/AITemplate/tree/main/hlky
and placed them in ComfyUI\custom_nodes\AIT\AITemplate\modules
but it also doesn't detect them. What do I have to do now?
Thank you very much @hlky
User selection of modules was replaced with an automatic selection based on os, cuda arch, batch size and resolution, so the modules need to be uploaded to huggingface then added to modules.json.
Until then you could add them to modules.json
yourself, some details of the format:
sd
field is set up to be xl
.sha256
field as the filename.v1
for the module selection as they are compatible, the above attached VAE modules are replacements for some that are already in modules.json
, the sha256 is different but the new modules can be named the same as existing sha256 to get them working now.url
does not need to be valid if the file exists.Modules themselves go in the same folder as modules.json
@CyberTimon Those particular modules are for Windows. I don't have WSL set up at the moment but I will compile linux XL modules soon. Just to note, cmake related things do not affect Linux.
Thank you for your answer!
So I can't use these (https://huggingface.co/city96/AITemplate/tree/main/hlky
) files?
I saw that you made a draft to compile SD_XL on the AiTemplate repo. Can I compile it my own right now? Other question. I always use dreamshaper xl (finetuned sdxl), can I still use it when I compile the base SDXL 1.0 repo?
Thanks again.
can I still use it when I compile the base SDXL 1.0 repo?
yes. AIT modules are architecture specific, not checkpoint specific. (as far as I'm aware atleast)
Wow! Didn't know that, very cool! I always had to recompile and redo all tensorrt model.
I will compile linux XL modules soon. @hlky
Great thank you
XL compilation works at the moment with that PR, and the others mentioned, 860 and 875.
With Linux it is easier, check readme installation instructions, there is also a Docker image, although there are not many dependencies to install other than CUDA and the usual build-essential
or equivalent.
The PRs should be merged before installation/creating docker image, or changes can be made to an already installed package (e.g. venv/Lib/site-packages/aitemplate
)
The attached build of AITemplate is set up for Windows with AIT_USE_CMAKE_COMPILATION=1
as default. On Linux it would need export AIT_USE_CMAKE_COMPILATION=0
Yes the modules are architecture specific not checkpoint specific. This plugin applies weights at runtime from the model loaded in the workflow. On Linux any size weights can be included in the module, on Windows including XL weights does not work due to the size, I think the limit is 2gb, so it works for v1/v2.
XL compilation works at the moment with that PR, and the others mentioned, 860 and 875.
With Linux it is easier, check readme installation instructions, there is also a Docker image, although there are not many dependencies to install other than CUDA and the usual
build-essential
or equivalent.The PRs should be merged before installation/creating docker image, or changes can be made to an already installed package (e.g.
venv/Lib/site-packages/aitemplate
)Yes the modules are architecture specific not checkpoint specific. This plugin applies weights at runtime from the model loaded in the workflow. On Linux any size weights can be included in the module, on Windows including XL weights does not work due to the size, I think the limit is 2gb, so it works for v1/v2.
oh, so the modules you linked here don't work on windows yet? I mean the DLL ones
@Shaistrong The SDXL modules attached here with .dll extension, and the existing SDXL modules on huggingface are for Windows.
@Shaistrong The SDXL modules attached here with .dll extension, and the existing SDXL modules on huggingface are for Windows.
oh, gotcha. so, when will modules.json be updated? or is there a version of the node that lets you select the module
@click.option( "--include-constants", default=True, type=bool, help="include constants (model weights) with compiled model", )
Hey @hlky, do I need to set this to false to only get these small files?? Thank you
Its downloading the SDXL pipeline! Hope this works, would be soooo cool. I left it the above option to default (True), if you tell me it should be false, I can easily restart it.
Its downloading the SDXL pipeline! Hope this works, would be soooo cool. I left it the above option to default (True), if you tell me it should be false, I can easily restart it.
nice! I already got the windows ones, but I can't get the custom node to see them
@Shaistrong @FizzleDorf is maintaining the huggingface repo where the modules will be uploaded, they should be around later. Automatic selection predates XL support and latest ComfyUI compatibility patches, so reverting it is not a quick fix.
@CyberTimon Yes set --include-constants False
. The released modules are then compressed with xz
to save bandwidth/storage, the plugin works with the compressed files if needed but it's best to extract them after download, afaik the plugin decompresses them now, or at least it should have solved the temporary space usage issue. A module compiled with weights will still work with this plugin though.
Let me know if there are any issues. Profiling will take several minutes, and compiling the module will take several minutes, for reference on E5-2686v4 (18c36t) XL UNet takes ~9-10 minutes to compile.
Hey @hlky
I'm getting this issue:
2023-08-07 22:44:42,612 INFO <aitemplate.backend.builder> Using 20 CPU for building
2023-08-07 22:44:42,612 INFO <aitemplate.backend.builder> combined 3 profiler sources into 3
2023-08-07 22:44:42,612 INFO <aitemplate.backend.builder> compiling 3 profiler sources
2023-08-07 22:44:42,612 INFO <aitemplate.backend.builder> linking 3 profiler executables
2023-08-07 22:45:01,331 INFO <aitemplate.backend.builder> make stdout:
make: Entering directory '/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/tmp/profiler'
rm -f gemm_rcr_bias/gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 gemm_rcr_bias_add/gemm_rcr_bias_add_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 argmax/argmax
make: Leaving directory '/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/tmp/profiler'
make: Entering directory '/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/tmp/profiler'
nvcc -t=0 -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -w -gencode=arch=compute_80,code=[sm_80,compute_80] -O3 -std=c++17 --expt-relaxed-constexpr -DNDEBUG --use_fast_math -DCUTLASS_USE_TANH_FOR_SIGMOID=1 -DAIT_USE_FAST_MATH=1 -Xcompiler=-fPIC -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -Xcompiler -fvisibility=hidden -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/../static/include/kernels -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/include -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/tools/util/include -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/35_gemm_softmax -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/41_fused_multi_head_attention -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/45_dual_gemm -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/../../backend/cuda/attention/src/./ -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/../../backend/cuda/attention/src/fmha -o argmax/argmax argmax/argmax.cu
make: Leaving directory '/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/tmp/profiler'
make: Entering directory '/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/tmp/profiler'
nvcc -t=0 -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -w -gencode=arch=compute_80,code=[sm_80,compute_80] -O3 -std=c++17 --expt-relaxed-constexpr -DNDEBUG --use_fast_math -DCUTLASS_USE_TANH_FOR_SIGMOID=1 -DAIT_USE_FAST_MATH=1 -Xcompiler=-fPIC -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -Xcompiler -fvisibility=hidden -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/../static/include/kernels -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/include -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/tools/util/include -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/35_gemm_softmax -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/41_fused_multi_head_attention -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/45_dual_gemm -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/../../backend/cuda/attention/src/./ -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/../../backend/cuda/attention/src/fmha -o gemm_rcr_bias/gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 gemm_rcr_bias/gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3.cu
make: Leaving directory '/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/tmp/profiler'
make: Entering directory '/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/tmp/profiler'
nvcc -t=0 -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -w -gencode=arch=compute_80,code=[sm_80,compute_80] -O3 -std=c++17 --expt-relaxed-constexpr -DNDEBUG --use_fast_math -DCUTLASS_USE_TANH_FOR_SIGMOID=1 -DAIT_USE_FAST_MATH=1 -Xcompiler=-fPIC -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -Xcompiler -fvisibility=hidden -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/../static/include/kernels -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/include -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/tools/util/include -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/35_gemm_softmax -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/41_fused_multi_head_attention -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/examples/45_dual_gemm -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/../../backend/cuda/attention/src/./ -I/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/3rdparty/cutlass/../../backend/cuda/attention/src/fmha -o gemm_rcr_bias_add/gemm_rcr_bias_add_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 gemm_rcr_bias_add/gemm_rcr_bias_add_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3.cu
make: Leaving directory '/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/tmp/profiler'
2023-08-07 22:45:01,331 INFO <aitemplate.backend.builder> make stderr:
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
make: *** [Makefile_eab8fad1b4b5db6abe91e2741f11521303099665:6: argmax/argmax] Error 255
make: *** Waiting for unfinished jobs....
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
make: *** [Makefile_eab8fad1b4b5db6abe91e2741f11521303099665:9: gemm_rcr_bias/gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3] Error 255
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
make: *** [Makefile_eab8fad1b4b5db6abe91e2741f11521303099665:12: gemm_rcr_bias_add/gemm_rcr_bias_add_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3] Error 255
Traceback (most recent call last):
File "/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/scripts/compile_sdxl.py", line 212, in <module>
compile_diffusers()
File "/home/cybertimon/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/cybertimon/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/cybertimon/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/cybertimon/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/scripts/compile_sdxl.py", line 110, in compile_diffusers
compile_clip(
File "/home/cybertimon/Repositories/ComfyUI/custom_nodes/AIT/AITemplate/examples/05_stable_diffusion/src/compile_lib/compile_clip_alt.py", line 104, in compile_clip
compile_model(
File "/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/utils/misc.py", line 93, in inner_function
return f(*args, **kwargs)
File "/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/compiler/compiler.py", line 276, in compile_model
compiler.transform.profile(
File "/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/compiler/transform/profile.py", line 92, in profile
compile_engine.make_profilers(generated_profilers, profiler_dir)
File "/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/backend/builder.py", line 835, in make_profilers
_run_make_cmds(
File "/home/cybertimon/.local/lib/python3.10/site-packages/aitemplate/backend/builder.py", line 183, in _run_make_cmds
raise RuntimeError("Build has failed.")
RuntimeError: Build has failed.
Do you need any more info about my system etc? I have a rtx 3090, ubuntu 22.04.
NVCC output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0```
Nvidia SMI says CUDA Version: 12.2
but the nvcc somehow says 11.5. Is this related with the issue? And how should I fix this?
@CyberTimon I think CUDA 11.5 is too old, the earliest mentioned on AITemplate repo is 11.6. You might have multiple versions installed and need to set some environment variables, but iirc nvidia-smi will display the CUDA Version
of the driver installed and CUDA toolkit is a separate install
I also see /usr/include/c++/11/
but it should be using -std=c++17
, maybe some packages are also too old.
I used the Docker image before, I'd try that
Thanks. I'm new to docker. I will try to compile it with docker. 👍 Thank you again.
Also I try to upgrade my cuda version. Is 11.7 good or should I install 11.8?
Anything past 11.6 should be good, I'd grab the latest for Ubuntu 22.04, I use 12.x for Windows.
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
|
For this part I think update gcc
or something, it's using c++/11
which suggests c++17
is unavailable.
How can I update gcc? I can't finy any info. I exectued sudo apt get install gcc/g++ 12 but I can't go past 12 because it doesn't find the package. Also I tried to run it with docker but I can't get past this:
cybertimon@server:~/Repositories/ComfyUI/custom_nodes/AIT/AITemplate$ sudo docker run ait
==========
== CUDA ==
==========
CUDA Version 11.6.2
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
cybertimon@server:~/Repositories/ComfyUI/custom_nodes/AIT/AITemplate$ sudo apt-get update \
&& sudo apt-get install -y nvidia-container-toolkit-base
Hit:1 http://ch.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://ch.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:3 https://download.docker.com/linux/ubuntu jammy InRelease
Hit:4 http://ch.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:5 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:6 https://packagecloud.io/ookla/speedtest-cli/ubuntu jammy InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package nvidia-container-toolkit-base
We are adding a lot of noise to this thread which is supposed to be about Windows. I'm not sure what's wrong with your environment and it's difficult to diagnose. I think it's best if you wait for me to compile Linux modules and then wait for them to be added to the plugin. I am beginning the process of compiling some SDXL Linux modules now.
Oh great and yes you're right. Something is always wrong with my nvidia version.. haha. Will try to fix it myself. Thank you very much for helping me. Please ping me when you compiled some SDXL linux modules! :) Thanks
We are adding a lot of noise to this thread which is supposed to be about Windows. I'm not sure what's wrong with your environment and it's difficult to diagnose. I think it's best if you wait for me to compile Linux modules and then wait for them to be added to the plugin. I am beginning the process of compiling some SDXL Linux modules now.
when is FizzleDorf going to add the modules so that the node will detect them?
sdxl modules now load from commit 942680d
Is there any information on how to reproduce the windows-native modules you have provided at https://huggingface.co/Fizzledorf/AITemplate_V1_V2/tree/main/modules/windows ? From what I've seen, the AITemplate codebase seems very linux-specific, and I can't find any information anywhere on cross-compiling from Linux to Windows either (not even with WSL). How were those modules made?