dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.09k stars 435 forks source link

xformers without triton (openAI) and other issues. #346

Open cj401 opened 8 months ago

cj401 commented 8 months ago

Hi Dustin,

thanks for your great work.

I was trying to run mistral-7b on Jetson ORIN with Jetpack (# R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 1 19:57:35 UTC 2023).

I built a docker image with xformers from this repo to run it as mistral depends on xformers. However, it did not succeed due to the following issues:

python3 -m main demo mistral-7B-v0.1/
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
    Python  3.8.10 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/workspace/cj/.nlp/mistral-src/main.py", line 171, in <module>
    fire.Fire(
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/cj/.nlp/mistral-src/main.py", line 154, in demo
    res, _logprobs = generate(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/cj/.nlp/mistral-src/main.py", line 75, in generate
    prelogits = model.forward(
  File "/workspace/cj/.nlp/mistral-src/mistral/model.py", line 241, in forward
    return self.output(self.forward_partial(
  File "/workspace/cj/.nlp/mistral-src/mistral/model.py", line 228, in forward_partial
    h = layer(h, freqs_cis, cache_view)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cj/.nlp/mistral-src/mistral/model.py", line 172, in forward
    r = self.attention.forward(self.attention_norm(x), freqs_cis, cache)
  File "/workspace/cj/.nlp/mistral-src/mistral/model.py", line 115, in forward
    output = memory_efficient_attention(xq, key, val, None if cache is None else cache.mask)
  File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
    return _memory_efficient_attention(
  File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/__init__.py", line 321, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
  File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/__init__.py", line 337, in _memory_efficient_attention_forward
    op = _dispatch_fw(inp, False)
  File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
    return _run_priority_list(
  File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 27, 32, 128) (torch.float16)
     key         : shape=(1, 27, 32, 128) (torch.float16)
     value       : shape=(1, 27, 32, 128) (torch.float16)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
     p           : 0.0
`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
`flshattF@0.0.0` is not supported because:
    xFormers wasn't build with CUDA support
`tritonflashattF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    dtype=torch.float16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 128

and info about xformers

python3 -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
    Python  3.8.10 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.23
memory_efficient_attention.cutlassF:               unavailable
memory_efficient_attention.cutlassB:               unavailable
memory_efficient_attention.decoderF:               unavailable
memory_efficient_attention.flshattF@0.0.0:         available
memory_efficient_attention.flshattB@0.0.0:         available
memory_efficient_attention.smallkF:                unavailable
memory_efficient_attention.smallkB:                unavailable
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         unavailable
indexing.scaled_index_addF:                        unavailable
indexing.scaled_index_addB:                        unavailable
indexing.index_select:                             unavailable
swiglu.dual_gemm_silu:                             unavailable
swiglu.gemm_fused_operand_sum:                     unavailable
swiglu.fused.p.cpp:                                not built
is_triton_available:                               False
pytorch.version:                                   2.0.0+nv23.05
pytorch.cuda:                                      available
gpu.compute_capability:                            8.7
gpu.name:                                          Orin
build.info:                                        available
build.cuda_version:                                1104
build.python_version:                              3.8.10
build.torch_version:                               2.0.0+nv23.05
build.env.TORCH_CUDA_ARCH_LIST:                    7.2;8.7
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source

I wonder if you could help look into this. Thanks in advance.

dusty-nv commented 8 months ago

Hi @cj401, when I run python3 -m xformers.info, this is the output and CUDA is reported to be enabled:

docker run --runtime nvidia dustynv/xformers:r35.4.1 python3 -m xformers.info
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.22.post7
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
memory_efficient_attention.flshattF@v2.3.2:        available
memory_efficient_attention.flshattB@v2.3.2:        available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         unavailable
indexing.scaled_index_addF:                        unavailable
indexing.scaled_index_addB:                        unavailable
indexing.index_select:                             unavailable
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               False
pytorch.version:                                   2.0.0+nv23.05
pytorch.cuda:                                      available
gpu.compute_capability:                            8.7
gpu.name:                                          Orin
build.info:                                        available
build.cuda_version:                                1104
build.python_version:                              3.8.10
build.torch_version:                               2.0.0+nv23.05
build.env.TORCH_CUDA_ARCH_LIST:                    7.2;8.7
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
build.nvcc_version:                                11.4.315
source.privacy:                                    open source

I suspect that the mistralai package is installing a different version of xformers (0.0.23 in your case)

I'd either edit the mistral src to not specify a specific xformers version (e.g. in the requirements.txt or project.toml), or uninstall/re-install xformers after installing mistral and make sure xformers builds with CUDA enabled.

See here in the xformers dockerfile for how it's installed: https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/xformers/Dockerfile

cj401 commented 8 months ago

Hi @dusty-nv thanks for your prompt reply.

when I run python3 -m xformers.info (mistral does not need to install and have no project.toml file as well :) ).

Not sure if my host cuda environment was messed up? I did not change the xformers dockerfile for building this.

docker run --runtime nvidia cj_orin_container:r35.4.1 python3 -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
    Python  3.8.10 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.23
memory_efficient_attention.cutlassF:               unavailable
memory_efficient_attention.cutlassB:               unavailable
memory_efficient_attention.decoderF:               unavailable
memory_efficient_attention.flshattF@0.0.0:         available
memory_efficient_attention.flshattB@0.0.0:         available
memory_efficient_attention.smallkF:                unavailable
memory_efficient_attention.smallkB:                unavailable
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         unavailable
indexing.scaled_index_addF:                        unavailable
indexing.scaled_index_addB:                        unavailable
indexing.index_select:                             unavailable
swiglu.dual_gemm_silu:                             unavailable
swiglu.gemm_fused_operand_sum:                     unavailable
swiglu.fused.p.cpp:                                not built
is_triton_available:                               False
pytorch.version:                                   2.0.0+nv23.05
pytorch.cuda:                                      available
gpu.compute_capability:                            8.7
gpu.name:                                          Orin
build.info:                                        available
build.cuda_version:                                1104
build.python_version:                              3.8.10
build.torch_version:                               2.0.0+nv23.05
build.env.TORCH_CUDA_ARCH_LIST:                    7.2;8.7
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source

Compared to your outputs to those generated on my side, it seems'build.nvcc_version: ' is missing.

on the host and native environment, I can get the followings:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0
dusty-nv commented 8 months ago

Not sure if my host cuda environment was messed up? I did not change the xformers dockerfile for building this.

Hi @cj401, it shouldn't be related to your host CUDA environment, because you're on JetPack 5 and CUDA is installed inside the containers there. What command(s) did you use to build your container? I just tried rebuilding my xformers container here, and it is still working as expected.

One other thing - have you set your default docker runtime to nvidia? https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#docker-default-runtime

cj401 commented 8 months ago

@dusty-nv thanks for your prompt reply. Yes. this setting has been followed already as shown.

nvidia@nvidia-desktop:~$ sudo docker info | grep 'Default Runtime'
 Default Runtime: nvidia

The command line I used to build xformers container is: ./build.sh --name=test_xformers xformers

And at the end of this, it emits:

testing xformers...
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
    Python  3.8.10 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.23
**memory_efficient_attention.cutlassF:               unavailable
memory_efficient_attention.cutlassB:               unavailable
memory_efficient_attention.decoderF:               unavailable**
memory_efficient_attention.flshattF@0.0.0:         available
memory_efficient_attention.flshattB@0.0.0:         available
**memory_efficient_attention.smallkF:                unavailable
memory_efficient_attention.smallkB:                unavailable**
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         unavailable
indexing.scaled_index_addF:                        unavailable
indexing.scaled_index_addB:                        unavailable
indexing.index_select:                             unavailable
**swiglu.dual_gemm_silu:                             unavailable
swiglu.gemm_fused_operand_sum:                     unavailable
swiglu.fused.p.cpp:                                not built**
is_triton_available:                               False
pytorch.version:                                   2.0.0+nv23.05
pytorch.cuda:                                      available
gpu.compute_capability:                            8.7
gpu.name:                                          Orin
build.info:                                        available
build.cuda_version:                                1104
build.python_version:                              3.8.10
build.torch_version:                               2.0.0+nv23.05
build.env.TORCH_CUDA_ARCH_LIST:                    7.2;8.7
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source
xformers OK

It can be seen that the bold highlighted parts are unavailable on my side, however, they are enabled on your side. :)

I wonder if I should remove all the docker images related to this and rebuild as it shows:

docker images
REPOSITORY                      TAG                       IMAGE ID       CREATED        SIZE
test_xformers                   r35.4.1                   223b5608c3ff   45 hours ago   12GB
test_xformers                   r35.4.1-xformers          223b5608c3ff   45 hours ago   12GB
cj_orin_container               r35.4.1                   223b5608c3ff   45 hours ago   12GB
cj_orin_container               r35.4.1-xformers          223b5608c3ff   45 hours ago   12GB
cj_orin_container               r35.4.1-pytorch           e26a554c8641   46 hours ago   11GB
test_xformers                   r35.4.1-pytorch           e26a554c8641   46 hours ago   11GB
cj_orin_container               r35.4.1-onnx              ed23f891f278   47 hours ago   10GB
test_xformers                   r35.4.1-onnx              ed23f891f278   47 hours ago   10GB
cj_orin_container               r35.4.1-cmake             deffd787f99f   47 hours ago   9.96GB
test_xformers                   r35.4.1-cmake             deffd787f99f   47 hours ago   9.96GB
cj_orin_container               r35.4.1-numpy             8f5b54034610   47 hours ago   9.9GB
test_xformers                   r35.4.1-numpy             8f5b54034610   47 hours ago   9.9GB
cj_orin_container               r35.4.1-python            51b626b7905b   47 hours ago   9.85GB
cj_orin_container               r35.4.1-tensorrt          51b626b7905b   47 hours ago   9.85GB
test_xformers                   r35.4.1-python            51b626b7905b   47 hours ago   9.85GB
test_xformers                   r35.4.1-tensorrt          51b626b7905b   47 hours ago   9.85GB
cj_orin_container               r35.4.1-build-essential   a17347bbf202   47 hours ago   9.76GB
cj_orin_container               r35.4.1-cuda              a17347bbf202   47 hours ago   9.76GB
cj_orin_container               r35.4.1-cudnn             a17347bbf202   47 hours ago   9.76GB
test_xformers                   r35.4.1-build-essential   a17347bbf202   47 hours ago   9.76GB
test_xformers                   r35.4.1-cuda              a17347bbf202   47 hours ago   9.76GB
test_xformers                   r35.4.1-cudnn             a17347bbf202   47 hours ago   9.76GB

Actually, I was trying to build xformers and triton (openAI) from source and installed them. I still got issues with my building,

python3 -m xformers.info
WARNING[XFORMERS]: Need to compile C++ extensions to use all xFormers features.
    Please install xformers properly (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
xFormers 0.0.0
memory_efficient_attention.cutlassF:               unavailable
memory_efficient_attention.cutlassB:               unavailable
memory_efficient_attention.decoderF:               unavailable
memory_efficient_attention.flshattF@0.0.0:         unavailable
memory_efficient_attention.flshattB@0.0.0:         unavailable
memory_efficient_attention.smallkF:                unavailable
memory_efficient_attention.smallkB:                unavailable
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             unavailable
swiglu.gemm_fused_operand_sum:                     unavailable
swiglu.fused.p.cpp:                                not built
is_triton_available:                               True
pytorch.version:                                   2.1.0a0+41361538.nv23.06
pytorch.cuda:                                      available
gpu.compute_capability:                            8.7
gpu.name:                                          Orin
build.info:                                        none
source.privacy:                                    open source

I reckon I properly mis-read something in the setup.py from xformers. Thanks any way.

cj401 commented 8 months ago

Hi @dusty-nv , I wonder if you used a command line similar to the below one to build the docker image on your ORIN. Thanks.

./build.sh --name=test_xformers xformers

dusty-nv commented 8 months ago

Hmm yes, that is the same that I ran here, on the same version of JetPack-L4T...

xunkai55 commented 7 months ago

I got the same bug. When I set XFORMERS_MORE_DETAILS=1, it implies a linkage error.

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
    Python  3.8.10 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/xformers/_cpp_lib.py", line 121, in _register_extensions
    torch.ops.load_library(ext_specs.origin)
  File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 643, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.8/dist-packages/xformers/_C.so: undefined symbol: _ZTIN4c10d12ProcessGroupE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/xformers/_cpp_lib.py", line 131, in <module>
    _build_metadata = _register_extensions()
  File "/usr/local/lib/python3.8/dist-packages/xformers/_cpp_lib.py", line 123, in _register_extensions
    raise xFormersInvalidLibException(build_metadata) from exc
xformers._cpp_lib.xFormersInvalidLibException: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.0+nv23.05 with CUDA 1104 (you have 2.0.0+nv23.05)
    Python  3.8.10 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.

The way I build the image: ./build.sh --name=test_xformers xformers

xunkai55 commented 7 months ago

It smells largely related with torch.distributed, as the torch version (2.0.0+nv23.05) doesn't carry some distribution related things.

xunkai55 commented 7 months ago

In the docker built on Orin, when I inspected torch.distributed.ProcessGroup, it said

>>> torch.distributed.ProcessGroup
<class 'torch.distributed._ProcessGroupStub'>
>>> torch.distributed.is_available()
False

When I inspected the thing on my laptop (mac osx), it said

>>> torch.distributed.ProcessGroup
<class 'torch.distributed.distributed_c10d.ProcessGroup'>
>>> torch.distributed.is_available()
True
>>> torch.__config__.show()
'PyTorch built with:\n  - GCC 9.4\n  - C++ Version: 201703\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: NO AVX\n  - CUDA Runtime 11.4\n  - NVCC architecture flags: -gencode;arch=compute_72,code=sm_72;-gencode;arch=compute_87,code=sm_87\n  - CuDNN 8.6\n  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=11.4, CUDNN_VERSION=8.6.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=0, USE_NNPACK=1, USE_OPENMP=ON, USE_ROCM=OFF, \n'

@dusty-nv Could you help check this class in your xformers docker? I'd suspect something changed in Torch, and led up the disappeared symbol. Thanks!

xunkai55 commented 7 months ago

Okay, I got xformers worked. Here's the finding:

main reason: xformers requires pytorch-distributed.

My steps:

  1. Use jeston-container's build.sh to build a pytorch:2.0-distributed (change the version according to the L4T versions) (takes super long, >2 hours)
  2. Fix the supported archs if needed (for me, I need to add 8.7 into supported args) just like https://github.com/facebookresearch/xformers/issues/741#issuecomment-1541507146
  3. Build xformers on top of the pytorch:2.0-distributed image (I didn't figure out how to skip the packages properly so I ran the commands in the xformers/Dockerfile step by step) (takes super long, >1 hour)
  4. Finally run the python3 -m xformers.info
# python3 -m xformers.info

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.23.post1
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
memory_efficient_attention.flshattF@v2.3.6:        available
memory_efficient_attention.flshattB@v2.3.6:        available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         unavailable
indexing.scaled_index_addF:                        unavailable
indexing.scaled_index_addB:                        unavailable
indexing.index_select:                             unavailable
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               False
pytorch.version:                                   2.0.0
pytorch.cuda:                                      available
gpu.compute_capability:                            8.7
gpu.name:                                          Orin
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                1104
build.python_version:                              3.8.10
build.torch_version:                               2.0.0
build.env.TORCH_CUDA_ARCH_LIST:                    7.2;8.7
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
build.nvcc_version:                                11.4.315
source.privacy:                                    open source

@dusty-nv Maybe it worth being documented somewhere, or somehow change the dependency of xformers from pytorch to pytorch-distributed (if possible). Thanks!

dusty-nv commented 7 months ago

Thanks for chasing this down @xunkai55 - it's unfortunate, because it seems the only hard dependency on torch.distributed in xformers was more recently added in its C++/C10 extension and appears to be temporary in nature:

https://github.com/facebookresearch/xformers/blob/6600003c2314af88befcec2cd6662957a662981d/xformers/csrc/boxing_unboxing.cpp#L8

Right now I am building/testing xformers against pytorch:distributed instead of pytorch, and if it works well I will push the changes. Normally I would try to patch out the use of torch.distributed in the xformers code as to not alter the entire build chain of PyTorch that other containers might rely on, but in this case it seems difficult/not-worthwhile and not that many of my other containers rely on xformers. And this is really only a JetPack 5 issue, because on JetPack 6 in my containers I default to USE_DISTRIBUTED=on in PyTorch.

EDIT: on JetPack 5, I am also changing the to pytorch:distributed alias to point to pytorch:2.1-distributed instead of pytorch:2.0-distributed to hopefully avoid the sm_87/8.7 issue

dusty-nv commented 7 months ago

OK, the test build was a success, so I committed this in https://github.com/dusty-nv/jetson-containers/commit/6f071067de2b98828bfc91a9de55da3c805b8aca. The dependent containers are rebuilding now.