facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
7.8k stars 550 forks source link

Library raises `NameError` if _has_cpp_library is False #1000

Open swails opened 2 months ago

swails commented 2 months ago

🐛 Bug

Command

To Reproduce

Steps to reproduce the behavior:

Call _matmul_with_mask when _has_cpp_library is False. These lines show that SparseCS is imported only if _has_cpp_library is True.

However, not all assert not isinstance(thing, SparseCS) is protected behind a if _has_cpp_library, leading to the following exception in one of my examples:

            if (
>               not isinstance(mask, SparseCS)
                and mask.ndim == 3
                and mask.shape[0] != att.shape[0]
                and (att.shape[0] % mask.shape[0]) == 0
            ):
E           NameError: name 'SparseCS' is not defined

/venv/lib/python3.10/site-packages/xformers/components/attention/core.py:105: NameError

Expected behavior

The code does not crash with a NameError

Environment

This is not easily recoverable. While it's possible, I believe that the description above points to the root cause pretty clearly.

Additional context

danthe3rd commented 2 months ago

Hi, How did you install xformers? It looks like you didn't build the library

swails commented 2 months ago

If you install from conda using the following environment.yaml file:

name: xformers-test
channels:
  - https://conda.anaconda.org/pytorch
  - https://conda.anaconda.org/xformers
  - https://conda.anaconda.org/dglteam/label/cu117
  - https://conda.anaconda.org/nvidia
  - https://conda.anaconda.org/conda-forge

dependencies:
  - python>=3.10
  - pip

  - pytorch >=2.0.0,<3.0.0a
  - pytorch-cuda==11.7
  - xformers==0.0.25

  - dgl ==1.1.0.cu117
  - torchmetrics >=0.10.0,<1.0.0a
  - pytorch-lightning >=1.7.3,<1.9.0

Then it installs xformers alongside a non-CUDA enabled PyTorch library and xformers is installed without the C++/CUDA libraries installed, straight from the xformers conda channel.

That said, the way the code is written it looks like the intent was to "work" in some fashion even without the C++ library available, but there are isinstance checks that break when this is the case.

Note that easing the pin on pytorch-cuda (so it's simply >=11.7,<12.0a) seems to resolve the issue (in that PyTorch is built with the CUDA bindings and xformers has the C++ library available).

That said, I think it would be worth clarifying the requirements in this package and, at a minimum, throw a more useful exception if the C++ library is actually required rather than an obscure NameError.

danthe3rd commented 2 months ago
name: xformers-test

This does not look like an official package. The package we provide is called xformers and not xformers-test, so I'm not sure where this is coming from... All the binaries we provide have the C++ library compiled already

swails commented 2 months ago

That's the name of the conda environment that gets created. The package is xformers==0.25.0 (see the dependencies section of the YAML file).

If you run conda env create -f environment.yaml with the YAML file I pasted above, you will find that the official binary released to the xformers conda channel fails to load the C++ extension.

All the binaries we provide have the C++ library compiled already

I am using a binary you're providing - xformers==0.25.0 from the https://conda.anaconda.org/xformers conda channel. The compiled library won't load properly if it can't find .so files at runtime that it was linked against during the binary builds:

(base) swails@supergirl [11:01:01 AM] [/usr/local/miniconda3/envs/xformers-test/lib/python3.10/site-packages/xformers] 
-> % ldd _C.so 
    linux-vdso.so.1 (0x00007fff28997000)
    libc10.so => not found
    libtorch.so => not found
    libtorch_cpu.so => not found
    libtorch_python.so => not found
    libcudart.so.11.0 => /usr/local/miniconda3/envs/xformers-test/lib/python3.10/site-packages/xformers/./../../../libcudart.so.11.0 (0x00007fa44e383000)
    libc10_cuda.so => not found
    libtorch_cuda.so => not found
    libstdc++.so.6 => /usr/local/miniconda3/envs/xformers-test/lib/python3.10/site-packages/xformers/./../../../libstdc++.so.6 (0x00007fa44e19e000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa44e04f000)
    libgcc_s.so.1 => /usr/local/miniconda3/envs/xformers-test/lib/python3.10/site-packages/xformers/./../../../libgcc_s.so.1 (0x00007fa44e034000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa44e011000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa44de1d000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa45aff9000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa44de17000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa44de0d000)
(base) swails@supergirl [11:01:05 AM] [/usr/local/miniconda3/envs/xformers-test/lib/python3.10/site-packages/xformers] 
-> % ldd _C_flashattention.so 
    linux-vdso.so.1 (0x00007ffe11be9000)
    libc10.so => not found
    libtorch.so => not found
    libtorch_cpu.so => not found
    libtorch_python.so => not found
    libcudart.so.11.0 => /usr/local/miniconda3/envs/xformers-test/lib/python3.10/site-packages/xformers/./../../../libcudart.so.11.0 (0x00007fef6f287000)
    libc10_cuda.so => not found
    libtorch_cuda.so => not found
    libstdc++.so.6 => /usr/local/miniconda3/envs/xformers-test/lib/python3.10/site-packages/xformers/./../../../libstdc++.so.6 (0x00007fef6f0a2000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fef6ef53000)
    libgcc_s.so.1 => /usr/local/miniconda3/envs/xformers-test/lib/python3.10/site-packages/xformers/./../../../libgcc_s.so.1 (0x00007fef6ef38000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fef6ef15000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fef6ed21000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fef7f878000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fef6ed1b000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fef6ed11000)

So sure, the extensions are compiled and installed with the conda package, but if it can't find the dynamic libraries it was compiled against, the extensions will fail to load and won't be available.

Since the environment I built did have PyTorch available, I suspect that the xformers binary you provide only works if both CUDA and PyTorch are available (it'd be nice to have the ability to test on hardware that doesn't have a CUDA accelerator).


This whole discussion is rather beside the point, though. The problem is that the code I pointed out doesn't work when the extensions are not available. Warnings printed when the C++ library fails to load, and the way the components/attention/core.py file is written suggest that the intent of the library is to work even if the C++ extensions can't be loaded (but will be slower and less memory-efficient). However, it doesn't work because a class used in an unprotected isinstance check is only imported if the C++ extension loaded.

If this method doesn't make sense to invoke without the C++ extension, then something like this:

ImportError: compiled extension is required to run <this method> but is not available

seems like a lot more helpful error message than what we're currently getting:

E           NameError: name 'SparseCS' is not defined

/venv/lib/python3.10/site-packages/xformers/components/attention/core.py:105: NameError

The reason we're getting the error message we are is pretty clear, I think, and I would expect the fix to be fairly straightforward.

danthe3rd commented 2 months ago

That's the name of the conda environment that gets created

Oh right - my bad.

The problem is that the code I pointed out doesn't work when the extensions are not available [...] the intent of the library is to work even if the C++ extensions can't be loaded

I agree on that part - the specific file you are trying to use could work without the C++ library. However, having fine-grained control over what can or cannot work when the C++ library is not available would require us to guard all components. While we did that initially (and you pointed out some guards), we've decided now that it's easier (for us) to just assume everyone has the C++ library - and otherwise there is no expectation that anything from xFormers will work. (furthermore, the component where you have the error might get deprecated as part of https://github.com/facebookresearch/xformers/issues/848 ).

However, if you want to open a PR to fix the issue, we will gladly accept it :)

swails commented 2 months ago

I definitely understand the reluctance to support countless different flavors of a library to accommodate arbitrary environments.

If this is indeed the decision (and I provide my meaningless support for said decision), then it would be significantly more helpful for me as a consumer of this library to have xformers fail with an ImportError immediately than neck-deep in an inner function call with an obscure error with internal details of the library.

I'm not sure how ubiquitous the C++ library is here -- namely would it be sensible to throw an ImportError if _has_cpp_library evaluates to False inside xformers/__init__.py so the library essentially pretends it's not even installed if the C++ library can't be loaded? Or is it only used in a couple files and would be better in those specific places?

I could definitely raise a PR, but one of the authors would probably have a more informed opinion of where to put the checks.