HazyResearch / H3

Language Modeling with the H3 State Space Model
Apache License 2.0
511 stars 53 forks source link

Having trouble for compiling fftconv #18

Closed Doraemonzzz closed 1 year ago

Doraemonzzz commented 1 year ago

Thanks for your great work. I have encountered difficulties in compiling fftconv, can you provide torch and cuda versions, as well as other environment variables. One more question, do you have plans to release the following version of the fftconv code:

u_f = torch.fft.fft(u)
k_f = torch.fft.fft(k)
y_f = u_f * k_f
y = torch.fft.ifft(y_f)
DanFu09 commented 1 year ago

Thanks for the question! Can you paste in what output you get from trying to compile it?

cd csrc/fftconv
pip install -e .

Here is an example of the reference FFTConv: https://github.com/HazyResearch/H3/blob/main/src/ops/fftconv.py#L15

The CUDA kernel also fuses in the residual connection, GELU, and dropout, so we benchmark against those as well.

Doraemonzzz commented 1 year ago

I paste the log as follow:

Looking in indexes: https://pypi.douban.com/simple/
Obtaining file:///data/user/code/H3/csrc/fftconv
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Installing collected packages: fftconv
  Attempting uninstall: fftconv
    Found existing installation: fftconv 0.1
    Can't uninstall 'fftconv'. No files were found to uninstall.
  Running setup.py develop for fftconv
    error: subprocess-exited-with-error

    × python setup.py develop did not run successfully.
    │ exit code: 1
    ╰─> [144 lines of output]

        torch.__version__  = 1.10.0+cu111

        running develop
        running egg_info
        writing fftconv.egg-info/PKG-INFO
        writing dependency_links to fftconv.egg-info/dependency_links.txt
        writing top-level names to fftconv.egg-info/top_level.txt
        reading manifest file 'fftconv.egg-info/SOURCES.txt'
        writing manifest file 'fftconv.egg-info/SOURCES.txt'
        running build_ext
        building 'fftconv' extension
        Emitting ninja build file /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/build.ninja...
        Compiling objects...
        Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
        /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/easy_install.py:156: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
          warnings.warn(
        /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
          warnings.warn(
        /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py:782: UserWarning: The detected CUDA version (11.2) has a minor version mismatch with the version that was used to compile PyTorch (11.1). Most likely this shouldn't be a problem.
          warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
        [1/2] /usr/bin/g++-7 -MMD -MF /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/user/code/H3/csrc/fftconv/mathdx/22.02/include -I/data/user/code/H3/csrc/fftconv/cub-1.17.2 -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/TH -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.2/include -I/home/company/user/anaconda3/envs/lra/include/python3.8 -c -c /data/user/code/H3/csrc/fftconv/fftconv.cpp -o /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o -g -march=native -funroll-loops -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fftconv -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
        FAILED: /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o
        /usr/bin/g++-7 -MMD -MF /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/user/code/H3/csrc/fftconv/mathdx/22.02/include -I/data/user/code/H3/csrc/fftconv/cub-1.17.2 -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/TH -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.2/include -I/home/company/user/anaconda3/envs/lra/include/python3.8 -c -c /data/user/code/H3/csrc/fftconv/fftconv.cpp -o /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o -g -march=native -funroll-loops -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fftconv -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
        cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
        /data/user/code/H3/csrc/fftconv/fftconv.cpp:6:10: fatal error: cuda/std/complex: 没有那个文件或目录
         #include <cuda/std/complex>
                  ^~~~~~~~~~~~~~~~~~
        compilation terminated.
        [2/2] /usr/local/cuda-11.2/bin/nvcc  -I/data/user/code/H3/csrc/fftconv/mathdx/22.02/include -I/data/user/code/H3/csrc/fftconv/cub-1.17.2 -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/TH -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.2/include -I/home/company/user/anaconda3/envs/lra/include/python3.8 -c -c /data/user/code/H3/csrc/fftconv/fftconv_cuda.cu -o /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --threads 4 -lineinfo --use_fast_math -std=c++17 -arch=compute_70 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fftconv -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /usr/bin/gcc-7
        FAILED: /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv_cuda.o
        /usr/local/cuda-11.2/bin/nvcc  -I/data/user/code/H3/csrc/fftconv/mathdx/22.02/include -I/data/user/code/H3/csrc/fftconv/cub-1.17.2 -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/TH -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.2/include -I/home/company/user/anaconda3/envs/lra/include/python3.8 -c -c /data/user/code/H3/csrc/fftconv/fftconv_cuda.cu -o /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --threads 4 -lineinfo --use_fast_math -std=c++17 -arch=compute_70 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fftconv -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /usr/bin/gcc-7
        In file included from /usr/local/cuda-11.2/include/thrust/system/cuda/detail/execution_policy.h:33:0,
                         from /usr/local/cuda-11.2/include/thrust/iterator/detail/device_system_tag.h:23,
                         from /usr/local/cuda-11.2/include/thrust/iterator/iterator_traits.h:111,
                         from /usr/local/cuda-11.2/include/thrust/detail/type_traits/pointer_traits.h:23,
                         from /usr/local/cuda-11.2/include/thrust/type_traits/is_contiguous_iterator.h:27,
                         from /usr/local/cuda-11.2/include/thrust/type_traits/is_trivially_relocatable.h:19,
                         from /usr/local/cuda-11.2/include/thrust/detail/complex/complex.inl:20,
                         from /usr/local/cuda-11.2/include/thrust/complex.h:1031,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/util/complex.h:8,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/util/Half.h:14,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/core/ScalarType.h:5,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/core/Scalar.h:11,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Operators.h:13,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
                         from /data/user/code/H3/csrc/fftconv/fftconv_cuda.cu:3:
        /usr/local/cuda-11.2/include/thrust/system/cuda/config.h:78:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
         #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
          ^~~~~
        In file included from /usr/local/cuda-11.2/include/thrust/system/cuda/detail/execution_policy.h:33:0,
                         from /usr/local/cuda-11.2/include/thrust/iterator/detail/device_system_tag.h:23,
                         from /usr/local/cuda-11.2/include/thrust/iterator/iterator_traits.h:111,
                         from /usr/local/cuda-11.2/include/thrust/detail/type_traits/pointer_traits.h:23,
                         from /usr/local/cuda-11.2/include/thrust/type_traits/is_contiguous_iterator.h:27,
                         from /usr/local/cuda-11.2/include/thrust/type_traits/is_trivially_relocatable.h:19,
                         from /usr/local/cuda-11.2/include/thrust/detail/complex/complex.inl:20,
                         from /usr/local/cuda-11.2/include/thrust/complex.h:1031,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/util/complex.h:8,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/util/Half.h:14,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/core/ScalarType.h:5,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/core/Scalar.h:11,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Operators.h:13,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                         from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
                         from /data/user/code/H3/csrc/fftconv/fftconv_cuda.cu:3:
        /usr/local/cuda-11.2/include/thrust/system/cuda/config.h:78:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
         #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
          ^~~~~
        ninja: build stopped: subcommand failed.
        Traceback (most recent call last):
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
            subprocess.run(
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/subprocess.py", line 516, in run
            raise CalledProcessError(retcode, process.args,
        subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

        The above exception was the direct cause of the following exception:

        Traceback (most recent call last):
          File "<string>", line 2, in <module>
          File "<pip-setuptools-caller>", line 34, in <module>
          File "/data/user/code/H3/csrc/fftconv/setup.py", line 116, in <module>
            setup(
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
            return distutils.core.setup(**attrs)
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/core.py", line 148, in setup
            dist.run_commands()
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/dist.py", line 966, in run_commands
            self.run_command(cmd)
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/dist.py", line 985, in run_command
            cmd_obj.run()
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
            self.install_for_development()
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/develop.py", line 114, in install_for_development
            self.run_command('build_ext')
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/dist.py", line 985, in run_command
            cmd_obj.run()
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
            _build_ext.run(self)
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/command/build_ext.py", line 340, in run
            self.build_extensions()
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
            build_ext.build_extensions(self)
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
            self._build_extensions_serial()
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
            self.build_extension(ext)
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
            _build_ext.build_extension(self, ext)
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
            objects = self.compiler.compile(sources,
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile
            _write_ninja_file_and_compile_objects(
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects
            _run_ninja_build(
          File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
            raise RuntimeError(message) from e
        RuntimeError: Error compiling objects for extension
        [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Can't roll back fftconv; was not uninstalled
error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [144 lines of output]

    torch.__version__  = 1.10.0+cu111

    running develop
    running egg_info
    writing fftconv.egg-info/PKG-INFO
    writing dependency_links to fftconv.egg-info/dependency_links.txt
    writing top-level names to fftconv.egg-info/top_level.txt
    reading manifest file 'fftconv.egg-info/SOURCES.txt'
    writing manifest file 'fftconv.egg-info/SOURCES.txt'
    running build_ext
    building 'fftconv' extension
    Emitting ninja build file /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/build.ninja...
    Compiling objects...
    Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
    /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/easy_install.py:156: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py:782: UserWarning: The detected CUDA version (11.2) has a minor version mismatch with the version that was used to compile PyTorch (11.1). Most likely this shouldn't be a problem.
      warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
    [1/2] /usr/bin/g++-7 -MMD -MF /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/user/code/H3/csrc/fftconv/mathdx/22.02/include -I/data/user/code/H3/csrc/fftconv/cub-1.17.2 -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/TH -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.2/include -I/home/company/user/anaconda3/envs/lra/include/python3.8 -c -c /data/user/code/H3/csrc/fftconv/fftconv.cpp -o /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o -g -march=native -funroll-loops -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fftconv -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    FAILED: /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o
    /usr/bin/g++-7 -MMD -MF /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/user/code/H3/csrc/fftconv/mathdx/22.02/include -I/data/user/code/H3/csrc/fftconv/cub-1.17.2 -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/TH -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.2/include -I/home/company/user/anaconda3/envs/lra/include/python3.8 -c -c /data/user/code/H3/csrc/fftconv/fftconv.cpp -o /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv.o -g -march=native -funroll-loops -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fftconv -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /data/user/code/H3/csrc/fftconv/fftconv.cpp:6:10: fatal error: cuda/std/complex: 没有那个文件或目录
     #include <cuda/std/complex>
              ^~~~~~~~~~~~~~~~~~
    compilation terminated.
    [2/2] /usr/local/cuda-11.2/bin/nvcc  -I/data/user/code/H3/csrc/fftconv/mathdx/22.02/include -I/data/user/code/H3/csrc/fftconv/cub-1.17.2 -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/TH -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.2/include -I/home/company/user/anaconda3/envs/lra/include/python3.8 -c -c /data/user/code/H3/csrc/fftconv/fftconv_cuda.cu -o /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --threads 4 -lineinfo --use_fast_math -std=c++17 -arch=compute_70 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fftconv -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /usr/bin/gcc-7
    FAILED: /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv_cuda.o
    /usr/local/cuda-11.2/bin/nvcc  -I/data/user/code/H3/csrc/fftconv/mathdx/22.02/include -I/data/user/code/H3/csrc/fftconv/cub-1.17.2 -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/TH -I/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.2/include -I/home/company/user/anaconda3/envs/lra/include/python3.8 -c -c /data/user/code/H3/csrc/fftconv/fftconv_cuda.cu -o /data/user/code/H3/csrc/fftconv/build/temp.linux-x86_64-3.8/fftconv_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --threads 4 -lineinfo --use_fast_math -std=c++17 -arch=compute_70 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fftconv -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /usr/bin/gcc-7
    In file included from /usr/local/cuda-11.2/include/thrust/system/cuda/detail/execution_policy.h:33:0,
                     from /usr/local/cuda-11.2/include/thrust/iterator/detail/device_system_tag.h:23,
                     from /usr/local/cuda-11.2/include/thrust/iterator/iterator_traits.h:111,
                     from /usr/local/cuda-11.2/include/thrust/detail/type_traits/pointer_traits.h:23,
                     from /usr/local/cuda-11.2/include/thrust/type_traits/is_contiguous_iterator.h:27,
                     from /usr/local/cuda-11.2/include/thrust/type_traits/is_trivially_relocatable.h:19,
                     from /usr/local/cuda-11.2/include/thrust/detail/complex/complex.inl:20,
                     from /usr/local/cuda-11.2/include/thrust/complex.h:1031,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/util/complex.h:8,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/util/Half.h:14,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/core/ScalarType.h:5,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/core/Scalar.h:11,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Operators.h:13,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
                     from /data/user/code/H3/csrc/fftconv/fftconv_cuda.cu:3:
    /usr/local/cuda-11.2/include/thrust/system/cuda/config.h:78:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
     #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
      ^~~~~
    In file included from /usr/local/cuda-11.2/include/thrust/system/cuda/detail/execution_policy.h:33:0,
                     from /usr/local/cuda-11.2/include/thrust/iterator/detail/device_system_tag.h:23,
                     from /usr/local/cuda-11.2/include/thrust/iterator/iterator_traits.h:111,
                     from /usr/local/cuda-11.2/include/thrust/detail/type_traits/pointer_traits.h:23,
                     from /usr/local/cuda-11.2/include/thrust/type_traits/is_contiguous_iterator.h:27,
                     from /usr/local/cuda-11.2/include/thrust/type_traits/is_trivially_relocatable.h:19,
                     from /usr/local/cuda-11.2/include/thrust/detail/complex/complex.inl:20,
                     from /usr/local/cuda-11.2/include/thrust/complex.h:1031,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/util/complex.h:8,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/util/Half.h:14,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/core/ScalarType.h:5,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/c10/core/Scalar.h:11,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Operators.h:13,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                     from /home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
                     from /data/user/code/H3/csrc/fftconv/fftconv_cuda.cu:3:
    /usr/local/cuda-11.2/include/thrust/system/cuda/config.h:78:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
     #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
      ^~~~~
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
        subprocess.run(
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "/data/user/code/H3/csrc/fftconv/setup.py", line 116, in <module>
        setup(
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/develop.py", line 114, in install_for_development
        self.run_command('build_ext')
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
        build_ext.build_extensions(self)
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
        _build_ext.build_extension(self, ext)
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
        objects = self.compiler.compile(sources,
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/home/company/user/anaconda3/envs/lra/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
Doraemonzzz commented 1 year ago

There are some problems when using cub head file with 1.10.0+cu111, to solve this, I download cub-1.17.2 in the fftconv dir, my working dir is as follows:

build       fftconv.cpp      fftconv.egg-info   lut_code_gen.py  map.h   setup.py         tmp.log
cub-1.17.2  fftconv_cuda.cu  launch_fftconv.py  lut.h            mathdx  static_switch.h  twiddle.cuh
Doraemonzzz commented 1 year ago

I think most problems are caused by the version of torch and cuda. Could you please give their version, or export the conda environment of h3 in the form of yaml? This will be very helpful.

DanFu09 commented 1 year ago

PyTorch version: 1.13.1 CUDA version: 11.7

We use Docker for everything, the FlashAttention Dockerfile should compile things well: https://github.com/HazyResearch/flash-attention/blob/main/training/Dockerfile

(that one may install a different version of `lash-attention than the one this code depends on).

Doraemonzzz commented 1 year ago

Hi, thanks for your suggestion. I have changed the environment to torch-1.13.1 and cuda-11.7, then I meet new problems as follow:

[1/3] /opt/rh/devtoolset-9/root/usr/bin/g++ -MMD -MF fftconv.o.d -DTORCH_EXTENSION_NAME=tno_causal_v12 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/nvme/user/code/fast-tno/src/fftconv/mathdx/22.02/include -I/nvme/share/cuda-11.8/include -isystem /nvme/user/miniconda3/envs/tno/lib/python3.8/site-packages/torch/include -isystem /nvme/user/miniconda3/envs/tno/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /nvme/user/miniconda3/envs/tno/lib/python3.8/site-packages/torch/include/TH -isystem /nvme/user/miniconda3/envs/tno/lib/python3.8/site-packages/torch/include/THC -isystem /nvme/share/cuda-11.8/include -isystem /nvme/user/miniconda3/envs/tno/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -g -march=native -funroll-loops -c /nvme/user/code/fast-tno/src/fftconv/fftconv.cpp -o fftconv.o 
FAILED: fftconv.o 
/opt/rh/devtoolset-9/root/usr/bin/g++ -MMD -MF fftconv.o.d -DTORCH_EXTENSION_NAME=tno_causal_v12 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/nvme/user/code/fast-tno/src/fftconv/mathdx/22.02/include -I/nvme/share/cuda-11.8/include -isystem /nvme/user/miniconda3/envs/tno/lib/python3.8/site-packages/torch/include -isystem /nvme/user/miniconda3/envs/tno/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /nvme/user/miniconda3/envs/tno/lib/python3.8/site-packages/torch/include/TH -isystem /nvme/user/miniconda3/envs/tno/lib/python3.8/site-packages/torch/include/THC -isystem /nvme/share/cuda-11.8/include -isystem /nvme/user/miniconda3/envs/tno/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -g -march=native -funroll-loops -c /nvme/user/code/fast-tno/src/fftconv/fftconv.cpp -o fftconv.o 
/nvme/user/code/fast-tno/src/fftconv/fftconv.cpp:8:10: fatal error: cuda_fp16.h: No such file or directory
    8 | #include <cuda_fp16.h>
      |          ^~~~~~~~~~~~~

My environment variable is as follows:

export CUDA_HOME=/nvme/share/cuda-11.8
export PATH=$CUDA_HOME/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=$CUDA_HOME/lib64
DanFu09 commented 1 year ago

We provide a Dockerfile here, are you able to try this? https://github.com/HazyResearch/H3/blob/main/Dockerfile

Usually in my experience these compile errors mean that there is something misconfigured with the CUDA or NVIDIA drivers. I believe NVIDIA has more information about the versions in the Docker image here: https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html (we use nvcr.io/nvidia/pytorch:22.11-py3).

Please let us know if this helps!

Doraemonzzz commented 1 year ago

Thanks for your suggestions, all problems are solved now.

Doraemonzzz commented 1 year ago

My solution is slightly different, and I list it here for the convenience of others.

  1. Update setup.py from:
    ext_modules.append(
    CUDAExtension(
        'fftconv', [
            'fftconv.cpp',
            'fftconv_cuda.cu',
        ],
        extra_compile_args={'cxx': ['-g', '-march=native', '-funroll-loops'],
                            'nvcc': ['-O3', '--threads', '4', '-lineinfo', '--use_fast_math', '-std=c++17', '-arch=compute_70']
        # extra_compile_args={'cxx': ['-O3'],
        #                     'nvcc': append_nvcc_threads(['-O3', '-lineinfo', '--use_fast_math', '-std=c++17'] + cc_flag)
                            },
        include_dirs=[os.path.join(this_dir, 'mathdx/22.02/include'),
                      ]
    )
    )

    to

    
    cuda_dir = os.environ["CUDA_HOME"]

ext_modules.append( CUDAExtension( 'fftconv', [ 'fftconv.cpp', 'fftconv_cuda.cu', ], extra_compile_args={'cxx': ['-g', '-march=native', '-funroll-loops'], 'nvcc': ['-O3', '--threads', '4', '-lineinfo', '--use_fast_math', '-std=c++17', '-arch=compute_70']

extra_compile_args={'cxx': ['-O3'],

    #                     'nvcc': append_nvcc_threads(['-O3', '-lineinfo', '--use_fast_math', '-std=c++17'] + cc_flag)
                        },
    include_dirs=[os.path.join(this_dir, 'mathdx/22.02/include'),
                  os.path.join(cuda_dir, "targets/x86_64-linux/include")
                  ]
)

)

2. Most problems are caused by cuda/torch version and env variable, I list them as follows:
torch version:

torch: 1.13.1+cu117

env variable:

export CUDA_HOME=/nvme/share/cuda-11.8 export PATH=$CUDA_HOME/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=$CUDA_HOME/lib64:${CUDA_HOME}/targets/x86_64-linux/include:$CUDA_HOME/include