facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.27k stars 579 forks source link

-fstack-clash-protection'; did you mean '-fstack-protector'? (and -fcf-protection) #497

Open enn-nafnlaus opened 1 year ago

enn-nafnlaus commented 1 year ago

🐛 Bug

I get a ton of errors like:

c++.exec: error: unrecognized command line option '-fstack-clash-protection'; did you mean '-fstack-protector'?
c++.exec: error: unrecognized command line option '-fcf-protection'; did you mean '-fstack-protector'?
c++.exec: error: unrecognized command line option '-fstack-clash-protection'; did you mean '-fstack-protector'?
c++.exec: error: unrecognized command line option '-fcf-protection'; did you mean '-fstack-protector'?
c++.exec: error: unrecognized command line option '-fstack-clash-protection'; did you mean '-fstack-protector'?
c++.exec: error: unrecognized command line option '-fcf-protection'; did you mean '-fstack-protector'?
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/stablediffusion/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build
    subprocess.run(
  File "/usr/lib64/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

Command

pip install git+https://github.com/facebookresearch/xformers

To Reproduce

Steps to reproduce the behavior:

pip install git+https://github.com/facebookresearch/xformers

Expected behavior

No errors. Package installs.

Environment

PyTorch version: 1.12.1+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Fedora Linux 36 (Workstation Edition) (x86_64) GCC version: (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) Clang version: 14.0.5 (Fedora 14.0.5-1.fc36) CMake version: version 3.24.1 Libc version: glibc-2.35

Python version: 3.10.7 (main, Sep 7 2022, 00:00:00) [GCC 12.2.1 20220819 (Red Hat 12.2.1-1)] (64-bit runtime) Python platform: Linux-5.3.11-100.fc29.x86_64-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Nvidia driver version: 515.65.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.0 [pip3] rotary-embedding-torch==0.1.5 [pip3] torch==1.12.1 [pip3] torchaudio==0.12.1 [pip3] torchvision==0.13.1

Additional context

danthe3rd commented 1 year ago

Hi @enn-nafnlaus

Would it be possible for you to update to CUDA 11? CUDA 10 is pretty old now, also you have a relatively recent version of GCC which might not be supported with CUDA 10. Can you share the output of nvcc --version ?

enn-nafnlaus commented 1 year ago

Will try to see if I can upgrade CUDA this evening - I have a training run going on right now. I will however note that this system was installed fresh just half a year ago, the latest Fedora version offered at the time (36), and this is the CUDA package autofetched from RPMFusion for that version, so if you're going to require more than that, there really should be a check. :)

nvcc not found. Will see if I can find a package to install it.

danthe3rd commented 1 year ago

Btw if you use conda, you can have a separate environment with a different CUDA version, a different pytorch setup etc... And we also provide a package with pre-built binaries with conda

nvcc not found

That's interesting .. because nvcc should be used to build xformers. I'm surprised you didn't have an error earlier. Can you provide the complete log of the following:

FORCE_CUDA=1 pip install -v git+https://github.com/facebookresearch/xformers
enn-nafnlaus commented 1 year ago

Failed due to a lack of nvcc:

FORCE_CUDA=1 pip install -v git+https://github.com/facebookresearch/xformers Using pip 21.3.1 from /usr/lib/python3.10/site-packages/pip (python 3.10) Defaulting to user installation because normal site-packages is not writeable Collecting git+https://github.com/facebookresearch/xformers Cloning https://github.com/facebookresearch/xformers to /tmp/pip-req-build-2d0ol4l5 Running command git version git version 2.37.3 Running command git clone --filter=blob:none -q https://github.com/facebookresearch/xformers /tmp/pip-req-build-2d0ol4l5 Running command git rev-parse HEAD e23b369c094685bd42e11928649cc03b93b768d5 Resolved https://github.com/facebookresearch/xformers to commit e23b369c094685bd42e11928649cc03b93b768d5 Running command git submodule update --init --recursive -q Running command python setup.py egg_info Traceback (most recent call last): File "", line 1, in File "/tmp/pip-req-build-2d0ol4l5/setup.py", line 258, in ext_modules=get_extensions(), File "/tmp/pip-req-build-2d0ol4l5/setup.py", line 197, in get_extensions cuda_version = get_cuda_version(CUDA_HOME) File "/tmp/pip-req-build-2d0ol4l5/setup.py", line 56, in get_cuda_version raw_output = subprocess.check_output([nvcc_bin, "-V"], universal_newlines=True) File "/usr/lib64/python3.10/subprocess.py", line 420, in check_output return run(popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib64/python3.10/subprocess.py", line 501, in run with Popen(popenargs, kwargs) as process: File "/usr/lib64/python3.10/subprocess.py", line 969, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib64/python3.10/subprocess.py", line 1845, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'nvcc'** Preparing metadata (setup.py) ... error WARNING: Discarding git+https://github.com/facebookresearch/xformers. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

enn-nafnlaus commented 1 year ago

Also, re: prebuilt packages, what I really need is branch xformers@51dd119#egg=xformers for the memory-efficient attention, but I figured it'd be better to report the bug against the mainline package (since I get the same error either way)

danthe3rd commented 1 year ago

This has been merged and memory-efficient attention is already available on the main branch - might be easier.

Regarding the error:

Failed due to a lack of nvcc:

This means you need to install CUDA toolkit

enn-nafnlaus commented 1 year ago

Upgraded to CUDA 11 while my training run was still going and successfully installed xformers - thanks :). Though can't try anything out until the training run completes or blows up so I can swap out drivers to the upgraded one ;)

Thanks again!

danthe3rd commented 1 year ago

I might be wrong, but I don't think you need to change your driver, if they are recent enough

enn-nafnlaus commented 1 year ago

Automate seems to disagree, AFAIK :)

_[2022-10-28 17:00:35,422] [WARNING] [runner.py:179:fetchhostfile] Unable to find hostfile, will proceed with training with local resources only. /home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/cuda/init.py:497: UserWarning: Can't initialize NVML ... Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)