XuezheMax / megalodon

Reference implementation of Megalodon 7B model
MIT License
487 stars 50 forks source link

Failed to install megalodon on V100. #2

Closed WailordHe closed 2 months ago

WailordHe commented 2 months ago

Failed to install on V100 with PyTorch 2.0.1 and cuda 11.7, is it because V100 doesn't support bf16, or is it a problem with ninja?

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [159 lines of output]
    running develop
    running egg_info
    writing megalodon.egg-info/PKG-INFO
    writing dependency_links to megalodon.egg-info/dependency_links.txt
    writing top-level names to megalodon.egg-info/top_level.txt
    reading manifest file 'megalodon.egg-info/SOURCES.txt'
    adding license file 'LICENSE'
    writing manifest file 'megalodon.egg-info/SOURCES.txt'
    running build_ext
    building 'megalodon_extension' extension
    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
    !!

            ********************************************************************************
            Please avoid running ``setup.py`` and ``easy_install``.
            Instead, use pypa/build, pypa/installer or other
            standards-based tools.

            See https://github.com/pypa/setuptools/issues/917 for details.
            ********************************************************************************

    !!
      easy_install.initialize_options(self)
    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
    !!

            ********************************************************************************
            Please avoid running ``setup.py`` directly.
            Instead, use pypa/build, pypa/installer or other
            standards-based tools.

            See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
            ********************************************************************************

    !!
      self.initialize_options()
    Emitting ninja build file /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/build.ninja...
    Compiling objects...
    Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
    [1/5] c++ -MMD -MF /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/blas.o.d -pthread -B /home/ma-user/anaconda3/envs/megalodon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ma-user/work/megalodon/megalodon/csrc -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/TH -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/ma-user/anaconda3/envs/megalodon/include/python3.8 -c -c /home/ma-user/work/megalodon/megalodon/csrc/blas.cc -o /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/blas.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=megalodon_extension -D_GLIBCXX_USE_CXX11_ABI=0
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    [2/5] /usr/local/cuda/bin/nvcc  -I/home/ma-user/work/megalodon/megalodon/csrc -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/TH -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/ma-user/anaconda3/envs/megalodon/include/python3.8 -c -c /home/ma-user/work/megalodon/megalodon/csrc/ops/attention_softmax_kernel.cu -o /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/ops/attention_softmax_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 --expt-relaxed-constexpr --expt-extended-lambda --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=megalodon_extension -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70
    FAILED: /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/ops/attention_softmax_kernel.o
    /usr/local/cuda/bin/nvcc  -I/home/ma-user/work/megalodon/megalodon/csrc -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/TH -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/ma-user/anaconda3/envs/megalodon/include/python3.8 -c -c /home/ma-user/work/megalodon/megalodon/csrc/ops/attention_softmax_kernel.cu -o /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/ops/attention_softmax_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 --expt-relaxed-constexpr --expt-extended-lambda --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=megalodon_extension -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70
    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
              detected during:
                instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
    (61): here
                instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/ATen/core/qualified_name.h(73): here

    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
              detected during:
                instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
    (61): here
                instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h(77): here

    /home/ma-user/work/megalodon/megalodon/csrc/register_utils.cuh(222): error: identifier "make_bfloat162" is undefined
              detected during:
                instantiation of "void megalodon::register_utils::Save<SRC,DST,kElementsPerThread>(const SRC *, int64_t, DST *) [with SRC=float, DST=c10::BFloat16, kElementsPerThread=1L]"
    /home/ma-user/work/megalodon/megalodon/csrc/softmax.cuh(130): here
                instantiation of "void megalodon::softmax::AttentionSoftmaxFwdKernel<T,T_ACC,kCapacity,kNumThreads>(int64_t, int64_t, __nv_bool, const T *, T *) [with T=c10::BFloat16, T_ACC=float, kCapacity=32L, kNumThreads=32L]"
    /home/ma-user/work/megalodon/megalodon/csrc/ops/attention_softmax_kernel.cu(43): here
                instantiation of "void megalodon::ops::<unnamed>::AttentionSoftmaxCUDAFwdImpl<T>(const at::Tensor &, double, __nv_bool, at::Tensor &) [with T=c10::BFloat16]"
    /home/ma-user/work/megalodon/megalodon/csrc/ops/attention_softmax_kernel.cu(100): here

    1 error detected in the compilation of "/home/ma-user/work/megalodon/megalodon/csrc/ops/attention_softmax_kernel.cu".
    [3/5] /usr/local/cuda/bin/nvcc  -I/home/ma-user/work/megalodon/megalodon/csrc -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/TH -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/ma-user/anaconda3/envs/megalodon/include/python3.8 -c -c /home/ma-user/work/megalodon/megalodon/csrc/ops/attention_kernel.cu -o /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/ops/attention_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 --expt-relaxed-constexpr --expt-extended-lambda --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=megalodon_extension -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70
    FAILED: /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/ops/attention_kernel.o
    /usr/local/cuda/bin/nvcc  -I/home/ma-user/work/megalodon/megalodon/csrc -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/TH -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/ma-user/anaconda3/envs/megalodon/include/python3.8 -c -c /home/ma-user/work/megalodon/megalodon/csrc/ops/attention_kernel.cu -o /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/ops/attention_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 --expt-relaxed-constexpr --expt-extended-lambda --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=megalodon_extension -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70
    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
              detected during:
                instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
    (61): here
                instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/ATen/core/qualified_name.h(73): here

    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
              detected during:
                instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
    (61): here
                instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
    /home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h(77): here

    /home/ma-user/work/megalodon/megalodon/csrc/register_utils.cuh(222): error: identifier "make_bfloat162" is undefined
              detected during:
                instantiation of "void megalodon::register_utils::Save<SRC,DST,kElementsPerThread>(const SRC *, int64_t, DST *) [with SRC=float, DST=c10::BFloat16, kElementsPerThread=1L]"
    /home/ma-user/work/megalodon/megalodon/csrc/softmax.cuh(130): here
                instantiation of "void megalodon::softmax::AttentionSoftmaxFwdKernel<T,T_ACC,kCapacity,kNumThreads>(int64_t, int64_t, __nv_bool, const T *, T *) [with T=c10::BFloat16, T_ACC=float, kCapacity=32L, kNumThreads=32L]"
    /home/ma-user/work/megalodon/megalodon/csrc/ops/attention_kernel.cu(95): here
                instantiation of "void megalodon::ops::<unnamed>::AttentionCUDAFwdImpl<T>(const at::Tensor &, const at::Tensor &, const at::Tensor &, double, double, __nv_bool, at::Tensor &, at::Tensor &) [with T=c10::BFloat16]"
    /home/ma-user/work/megalodon/megalodon/csrc/ops/attention_kernel.cu(273): here

    1 error detected in the compilation of "/home/ma-user/work/megalodon/megalodon/csrc/ops/attention_kernel.cu".
    [4/5] c++ -MMD -MF /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/megalodon_extension.o.d -pthread -B /home/ma-user/anaconda3/envs/megalodon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ma-user/work/megalodon/megalodon/csrc -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/TH -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/ma-user/anaconda3/envs/megalodon/include/python3.8 -c -c /home/ma-user/work/megalodon/megalodon/csrc/megalodon_extension.cc -o /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/megalodon_extension.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=megalodon_extension -D_GLIBCXX_USE_CXX11_ABI=0
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    [5/5] c++ -MMD -MF /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/ops/attention.o.d -pthread -B /home/ma-user/anaconda3/envs/megalodon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ma-user/work/megalodon/megalodon/csrc -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/TH -I/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/ma-user/anaconda3/envs/megalodon/include/python3.8 -c -c /home/ma-user/work/megalodon/megalodon/csrc/ops/attention.cc -o /home/ma-user/work/megalodon/build/temp.linux-x86_64-cpython-38/home/ma-user/work/megalodon/megalodon/csrc/ops/attention.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=megalodon_extension -D_GLIBCXX_USE_CXX11_ABI=0
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
        subprocess.run(
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "/home/ma-user/work/megalodon/setup.py", line 65, in <module>
        main()
      File "/home/ma-user/work/megalodon/setup.py", line 48, in main
        setup(
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/__init__.py", line 103, in setup
        return distutils.core.setup(**attrs)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
        return run_commands(dist)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
        dist.run_commands()
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
        self.run_command(cmd)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
        super().run_command(command)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/command/develop.py", line 109, in install_for_development
        self.run_command('build_ext')
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
        self.distribution.run_command(command)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
        super().run_command(command)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 88, in run
        _build_ext.run(self)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
        self.build_extensions()
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
        build_ext.build_extensions(self)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
        self._build_extensions_serial()
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
        self.build_extension(ext)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 249, in build_extension
        _build_ext.build_extension(self, ext)
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
        objects = self.compiler.compile(
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/home/ma-user/anaconda3/envs/megalodon/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    [end of output]
violet-zct commented 2 months ago

Till now, we only tested our code on A100. Sorry for it.

xiaomengy commented 2 months ago

Hi, thanks for letting us know this. Please let me clarify a little more for this issue. Here there are actually two issues.

  1. The build error is saying identifier "make_bfloat162" is undefined. The function make_bfloat162 is added from CUDA 12.2.0. So this might because your CUDA version is older than this. Some reference can be found here.
  2. We run our jobs on A100 GPUs. For V100 GPUs, because of the smaller shared memory size (164KB vs 96KB) (reference) and other computation resources compared to A100, you may not be able to run our custom operators for sequences with large length even if you have upgraded to the new verison of CUDA.

I'm sorry for this incovenience. We implemented our custom PyTorch operators based on the computation resources of our machines, so it is a little hard to let it support all of the GPUs.

WailordHe commented 2 months ago

Thank you @xiaomengy and @violet-zct, I've successfully installed on A100 GPUs. Do you have any simple example training codes? The pseudo code seems to be missing some necessary elements.

XuezheMax commented 2 months ago

Hi @WailordHe,

For the training code, the main missing component is data loader, which we are not allowed to share due to license. My suggestion is to merge your own data loader to this codebase. Thanks.