Open cj401 opened 8 months ago
Hi @cj401, when I run python3 -m xformers.info
, this is the output and CUDA is reported to be enabled:
docker run --runtime nvidia dustynv/xformers:r35.4.1 python3 -m xformers.info
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.22.post7
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@v2.3.2: available
memory_efficient_attention.flshattB@v2.3.2: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: False
pytorch.version: 2.0.0+nv23.05
pytorch.cuda: available
gpu.compute_capability: 8.7
gpu.name: Orin
build.info: available
build.cuda_version: 1104
build.python_version: 3.8.10
build.torch_version: 2.0.0+nv23.05
build.env.TORCH_CUDA_ARCH_LIST: 7.2;8.7
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 11.4.315
source.privacy: open source
I suspect that the mistralai package is installing a different version of xformers (0.0.23 in your case)
I'd either edit the mistral src to not specify a specific xformers version (e.g. in the requirements.txt or project.toml), or uninstall/re-install xformers after installing mistral and make sure xformers builds with CUDA enabled.
See here in the xformers dockerfile for how it's installed: https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/xformers/Dockerfile
Hi @dusty-nv thanks for your prompt reply.
when I run python3 -m xformers.info (mistral does not need to install and have no project.toml file as well :) ).
Not sure if my host cuda environment was messed up? I did not change the xformers dockerfile for building this.
docker run --runtime nvidia cj_orin_container:r35.4.1 python3 -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
Python 3.8.10 (you have 3.8.10)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.23
memory_efficient_attention.cutlassF: unavailable
memory_efficient_attention.cutlassB: unavailable
memory_efficient_attention.decoderF: unavailable
memory_efficient_attention.flshattF@0.0.0: available
memory_efficient_attention.flshattB@0.0.0: available
memory_efficient_attention.smallkF: unavailable
memory_efficient_attention.smallkB: unavailable
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
swiglu.dual_gemm_silu: unavailable
swiglu.gemm_fused_operand_sum: unavailable
swiglu.fused.p.cpp: not built
is_triton_available: False
pytorch.version: 2.0.0+nv23.05
pytorch.cuda: available
gpu.compute_capability: 8.7
gpu.name: Orin
build.info: available
build.cuda_version: 1104
build.python_version: 3.8.10
build.torch_version: 2.0.0+nv23.05
build.env.TORCH_CUDA_ARCH_LIST: 7.2;8.7
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
source.privacy: open source
Compared to your outputs to those generated on my side, it seems'build.nvcc_version: '
is missing.
on the host and native environment, I can get the followings:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0
Not sure if my host cuda environment was messed up? I did not change the xformers dockerfile for building this.
Hi @cj401, it shouldn't be related to your host CUDA environment, because you're on JetPack 5 and CUDA is installed inside the containers there. What command(s) did you use to build your container? I just tried rebuilding my xformers container here, and it is still working as expected.
One other thing - have you set your default docker runtime to nvidia? https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#docker-default-runtime
@dusty-nv thanks for your prompt reply. Yes. this setting has been followed already as shown.
nvidia@nvidia-desktop:~$ sudo docker info | grep 'Default Runtime'
Default Runtime: nvidia
The command line I used to build xformers container is:
./build.sh --name=test_xformers xformers
And at the end of this, it emits:
testing xformers...
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
Python 3.8.10 (you have 3.8.10)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.23
**memory_efficient_attention.cutlassF: unavailable
memory_efficient_attention.cutlassB: unavailable
memory_efficient_attention.decoderF: unavailable**
memory_efficient_attention.flshattF@0.0.0: available
memory_efficient_attention.flshattB@0.0.0: available
**memory_efficient_attention.smallkF: unavailable
memory_efficient_attention.smallkB: unavailable**
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
**swiglu.dual_gemm_silu: unavailable
swiglu.gemm_fused_operand_sum: unavailable
swiglu.fused.p.cpp: not built**
is_triton_available: False
pytorch.version: 2.0.0+nv23.05
pytorch.cuda: available
gpu.compute_capability: 8.7
gpu.name: Orin
build.info: available
build.cuda_version: 1104
build.python_version: 3.8.10
build.torch_version: 2.0.0+nv23.05
build.env.TORCH_CUDA_ARCH_LIST: 7.2;8.7
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
source.privacy: open source
xformers OK
It can be seen that the bold highlighted parts are unavailable on my side, however, they are enabled on your side. :)
I wonder if I should remove all the docker images related to this and rebuild as it shows:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
test_xformers r35.4.1 223b5608c3ff 45 hours ago 12GB
test_xformers r35.4.1-xformers 223b5608c3ff 45 hours ago 12GB
cj_orin_container r35.4.1 223b5608c3ff 45 hours ago 12GB
cj_orin_container r35.4.1-xformers 223b5608c3ff 45 hours ago 12GB
cj_orin_container r35.4.1-pytorch e26a554c8641 46 hours ago 11GB
test_xformers r35.4.1-pytorch e26a554c8641 46 hours ago 11GB
cj_orin_container r35.4.1-onnx ed23f891f278 47 hours ago 10GB
test_xformers r35.4.1-onnx ed23f891f278 47 hours ago 10GB
cj_orin_container r35.4.1-cmake deffd787f99f 47 hours ago 9.96GB
test_xformers r35.4.1-cmake deffd787f99f 47 hours ago 9.96GB
cj_orin_container r35.4.1-numpy 8f5b54034610 47 hours ago 9.9GB
test_xformers r35.4.1-numpy 8f5b54034610 47 hours ago 9.9GB
cj_orin_container r35.4.1-python 51b626b7905b 47 hours ago 9.85GB
cj_orin_container r35.4.1-tensorrt 51b626b7905b 47 hours ago 9.85GB
test_xformers r35.4.1-python 51b626b7905b 47 hours ago 9.85GB
test_xformers r35.4.1-tensorrt 51b626b7905b 47 hours ago 9.85GB
cj_orin_container r35.4.1-build-essential a17347bbf202 47 hours ago 9.76GB
cj_orin_container r35.4.1-cuda a17347bbf202 47 hours ago 9.76GB
cj_orin_container r35.4.1-cudnn a17347bbf202 47 hours ago 9.76GB
test_xformers r35.4.1-build-essential a17347bbf202 47 hours ago 9.76GB
test_xformers r35.4.1-cuda a17347bbf202 47 hours ago 9.76GB
test_xformers r35.4.1-cudnn a17347bbf202 47 hours ago 9.76GB
Actually, I was trying to build xformers and triton (openAI) from source and installed them. I still got issues with my building,
python3 -m xformers.info
WARNING[XFORMERS]: Need to compile C++ extensions to use all xFormers features.
Please install xformers properly (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
xFormers 0.0.0
memory_efficient_attention.cutlassF: unavailable
memory_efficient_attention.cutlassB: unavailable
memory_efficient_attention.decoderF: unavailable
memory_efficient_attention.flshattF@0.0.0: unavailable
memory_efficient_attention.flshattB@0.0.0: unavailable
memory_efficient_attention.smallkF: unavailable
memory_efficient_attention.smallkB: unavailable
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
swiglu.dual_gemm_silu: unavailable
swiglu.gemm_fused_operand_sum: unavailable
swiglu.fused.p.cpp: not built
is_triton_available: True
pytorch.version: 2.1.0a0+41361538.nv23.06
pytorch.cuda: available
gpu.compute_capability: 8.7
gpu.name: Orin
build.info: none
source.privacy: open source
I reckon I properly mis-read something in the setup.py from xformers. Thanks any way.
Hi @dusty-nv , I wonder if you used a command line similar to the below one to build the docker image on your ORIN. Thanks.
./build.sh --name=test_xformers xformers
Hmm yes, that is the same that I ran here, on the same version of JetPack-L4T...
I got the same bug. When I set XFORMERS_MORE_DETAILS=1, it implies a linkage error.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
Python 3.8.10 (you have 3.8.10)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/xformers/_cpp_lib.py", line 121, in _register_extensions
torch.ops.load_library(ext_specs.origin)
File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.8/dist-packages/xformers/_C.so: undefined symbol: _ZTIN4c10d12ProcessGroupE
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/xformers/_cpp_lib.py", line 131, in <module>
_build_metadata = _register_extensions()
File "/usr/local/lib/python3.8/dist-packages/xformers/_cpp_lib.py", line 123, in _register_extensions
raise xFormersInvalidLibException(build_metadata) from exc
xformers._cpp_lib.xFormersInvalidLibException: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
Python 3.8.10 (you have 3.8.10)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
The way I build the image: ./build.sh --name=test_xformers xformers
It smells largely related with torch.distributed
, as the torch version (2.0.0+nv23.05
) doesn't carry some distribution related things.
In the docker built on Orin, when I inspected torch.distributed.ProcessGroup
, it said
>>> torch.distributed.ProcessGroup
<class 'torch.distributed._ProcessGroupStub'>
>>> torch.distributed.is_available()
False
When I inspected the thing on my laptop (mac osx), it said
>>> torch.distributed.ProcessGroup
<class 'torch.distributed.distributed_c10d.ProcessGroup'>
>>> torch.distributed.is_available()
True
>>> torch.__config__.show()
'PyTorch built with:\n - GCC 9.4\n - C++ Version: 201703\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: NO AVX\n - CUDA Runtime 11.4\n - NVCC architecture flags: -gencode;arch=compute_72,code=sm_72;-gencode;arch=compute_87,code=sm_87\n - CuDNN 8.6\n - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=11.4, CUDNN_VERSION=8.6.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=0, USE_NNPACK=1, USE_OPENMP=ON, USE_ROCM=OFF, \n'
@dusty-nv Could you help check this class in your xformers
docker? I'd suspect something changed in Torch, and led up the disappeared symbol. Thanks!
Okay, I got xformers
worked. Here's the finding:
main reason: xformers
requires pytorch-distributed
.
My steps:
build.sh
to build a pytorch:2.0-distributed
(change the version according to the L4T versions) (takes super long, >2 hours)8.7
into supported args) just like https://github.com/facebookresearch/xformers/issues/741#issuecomment-1541507146xformers
on top of the pytorch:2.0-distributed
image (I didn't figure out how to skip the packages properly so I ran the commands in the xformers/Dockerfile step by step) (takes super long, >1 hour)python3 -m xformers.info
# python3 -m xformers.info
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.23.post1
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@v2.3.6: available
memory_efficient_attention.flshattB@v2.3.6: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: False
pytorch.version: 2.0.0
pytorch.cuda: available
gpu.compute_capability: 8.7
gpu.name: Orin
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1104
build.python_version: 3.8.10
build.torch_version: 2.0.0
build.env.TORCH_CUDA_ARCH_LIST: 7.2;8.7
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 11.4.315
source.privacy: open source
@dusty-nv Maybe it worth being documented somewhere, or somehow change the dependency of xformers
from pytorch
to pytorch-distributed
(if possible). Thanks!
Thanks for chasing this down @xunkai55 - it's unfortunate, because it seems the only hard dependency on torch.distributed in xformers was more recently added in its C++/C10 extension and appears to be temporary in nature:
Right now I am building/testing xformers against pytorch:distributed
instead of pytorch
, and if it works well I will push the changes. Normally I would try to patch out the use of torch.distributed
in the xformers code as to not alter the entire build chain of PyTorch that other containers might rely on, but in this case it seems difficult/not-worthwhile and not that many of my other containers rely on xformers. And this is really only a JetPack 5 issue, because on JetPack 6 in my containers I default to USE_DISTRIBUTED=on
in PyTorch.
EDIT: on JetPack 5, I am also changing the to pytorch:distributed
alias to point to pytorch:2.1-distributed
instead of pytorch:2.0-distributed
to hopefully avoid the sm_87/8.7 issue
OK, the test build was a success, so I committed this in https://github.com/dusty-nv/jetson-containers/commit/6f071067de2b98828bfc91a9de55da3c805b8aca. The dependent containers are rebuilding now.
Hi Dustin,
thanks for your great work.
I was trying to run mistral-7b on Jetson ORIN with Jetpack (# R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 1 19:57:35 UTC 2023).
I built a docker image with xformers from this repo to run it as mistral depends on xformers. However, it did not succeed due to the following issues:
and info about xformers
I wonder if you could help look into this. Thanks in advance.