XuezheMax / megalodon

Reference implementation of Megalodon 7B model
MIT License
487 stars 50 forks source link

Cuda 11.8/12.1 #7

Closed timmytwoteeth closed 1 month ago

timmytwoteeth commented 2 months ago

Hi,

Are there any incompatibilities with running megalodon on cuda 11.8 or 12.1?

Thank you.

XuezheMax commented 2 months ago

Would you please provide some details of your issue? Thanks.

timmytwoteeth commented 2 months ago

Would you please provide some details of your issue? Thanks.

Hi @XuezheMax,

Maybe it is possibly a GCC compiler or ninja error instead but CUDA 11.8 was flagged during pip install -e ..

Obtaining file://///
  Preparing metadata (setup.py) ... done
Installing collected packages: megalodon
  Running setup.py develop for megalodon
    error: subprocess-exited-with-error

    × python setup.py develop did not run successfully.
    │ exit code: 1
    ╰─> [103 lines of output]
        running develop
        ///3/envs//lib/python3.10/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
        !!

                ********************************************************************************
                Please avoid running ``setup.py`` and ``easy_install``.
                Instead, use pypa/build, pypa/installer or other
                standards-based tools.

                See https://github.com/pypa/setuptools/issues/917 for details.
                ********************************************************************************

        !!
          easy_install.initialize_options(self)
        ///3/envs//lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
        !!

                ********************************************************************************
                Please avoid running ``setup.py`` directly.
                Instead, use pypa/build, pypa/installer or other
                standards-based tools.

                See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
                ********************************************************************************

        !!
          self.initialize_options()
        running egg_info
        writing megalodon.egg-info/PKG-INFO
        writing dependency_links to megalodon.egg-info/dependency_links.txt
        writing top-level names to megalodon.egg-info/top_level.txt
        reading manifest file 'megalodon.egg-info/SOURCES.txt'
        writing manifest file 'megalodon.egg-info/SOURCES.txt'
        running build_ext
        ///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py:428: UserWarning: There are no g++ version bounds defined for CUDA version 11.8
          warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
        building 'megalodon_extension' extension
        ///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
        If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
          warnings.warn(
        Emitting ninja build file ////build/temp.linux-x86_64-cpython-310/build.ninja...
        Compiling objects...
        Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
        ninja: error: '////megalodon/csrc/blas.cc', needed by '////build/temp.linux-x86_64-cpython-310////megalodon/csrc/blas.o', missing and no known rule to make it
        Traceback (most recent call last):
          File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
            subprocess.run(
          File "///3/envs//lib/python3.10/subprocess.py", line 526, in run
            raise CalledProcessError(retcode, process.args,
        subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

        The above exception was the direct cause of the following exception:

        Traceback (most recent call last):
          File "<string>", line 2, in <module>
          File "<pip-setuptools-caller>", line 34, in <module>
          File "////setup.py", line 55, in <module>
            main()
          File "////setup.py", line 38, in main
            setup(
          File "///3/envs//lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
            return distutils.core.setup(**attrs)
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
            return run_commands(dist)
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
            dist.run_commands()
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
            self.run_command(cmd)
          File "///3/envs//lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
            super().run_command(command)
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
            cmd_obj.run()
          File "///3/envs//lib/python3.10/site-packages/setuptools/command/develop.py", line 34, in run
            self.install_for_development()
          File "///3/envs//lib/python3.10/site-packages/setuptools/command/develop.py", line 111, in install_for_development
            self.run_command('build_ext')
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
            self.distribution.run_command(command)
          File "///3/envs//lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
            super().run_command(command)
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
            cmd_obj.run()
          File "///3/envs//lib/python3.10/site-packages/setuptools/command/build_ext.py", line 91, in run
            _build_ext.run(self)
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
            self.build_extensions()
          File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
            build_ext.build_extensions(self)
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
            self._build_extensions_serial()
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
            self.build_extension(ext)
          File "///3/envs//lib/python3.10/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
            _build_ext.build_extension(self, ext)
          File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
            objects = self.compiler.compile(
          File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 683, in unix_wrap_ninja_compile
            _write_ninja_file_and_compile_objects(
          File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1783, in _write_ninja_file_and_compile_objects
            _run_ninja_build(
          File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
            raise RuntimeError(message) from e
        RuntimeError: Error compiling objects for extension
        [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [103 lines of output]
    running develop
    ///3/envs//lib/python3.10/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
    !!

            ********************************************************************************
            Please avoid running ``setup.py`` and ``easy_install``.
            Instead, use pypa/build, pypa/installer or other
            standards-based tools.

            See https://github.com/pypa/setuptools/issues/917 for details.
            ********************************************************************************

    !!
      easy_install.initialize_options(self)
    ///3/envs//lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
    !!

            ********************************************************************************
            Please avoid running ``setup.py`` directly.
            Instead, use pypa/build, pypa/installer or other
            standards-based tools.

            See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
            ********************************************************************************

    !!
      self.initialize_options()
    running egg_info
    writing megalodon.egg-info/PKG-INFO
    writing dependency_links to megalodon.egg-info/dependency_links.txt
    writing top-level names to megalodon.egg-info/top_level.txt
    reading manifest file 'megalodon.egg-info/SOURCES.txt'
    writing manifest file 'megalodon.egg-info/SOURCES.txt'
    running build_ext
    ///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py:428: UserWarning: There are no g++ version bounds defined for CUDA version 11.8
      warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
    building 'megalodon_extension' extension
    ///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
      warnings.warn(
    Emitting ninja build file ////build/temp.linux-x86_64-cpython-310/build.ninja...
    Compiling objects...
    Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
    ninja: error: '////megalodon/csrc/blas.cc', needed by '////build/temp.linux-x86_64-cpython-310////megalodon/csrc/blas.o', missing and no known rule to make it
    Traceback (most recent call last):
      File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
        subprocess.run(
      File "///3/envs//lib/python3.10/subprocess.py", line 526, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "////setup.py", line 55, in <module>
        main()
      File "////setup.py", line 38, in main
        setup(
      File "///3/envs//lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
        return distutils.core.setup(**attrs)
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
        return run_commands(dist)
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
        dist.run_commands()
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
        self.run_command(cmd)
      File "///3/envs//lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
        super().run_command(command)
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "///3/envs//lib/python3.10/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "///3/envs//lib/python3.10/site-packages/setuptools/command/develop.py", line 111, in install_for_development
        self.run_command('build_ext')
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
        self.distribution.run_command(command)
      File "///3/envs//lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
        super().run_command(command)
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "///3/envs//lib/python3.10/site-packages/setuptools/command/build_ext.py", line 91, in run
        _build_ext.run(self)
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
        self.build_extensions()
      File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
        build_ext.build_extensions(self)
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
        self._build_extensions_serial()
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
        self.build_extension(ext)
      File "///3/envs//lib/python3.10/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
        _build_ext.build_extension(self, ext)
      File "///3/envs//lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
        objects = self.compiler.compile(
      File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 683, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1783, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "///3/envs//lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    [end of output]

Thank you.

XuezheMax commented 2 months ago

Hi @timmytwoteeth

It seems your error is irrelevant with CUDA. I am not 100% sure about the reason, but I suggest you first uninstall ninja. Then re-execute python setup.py develop

timmytwoteeth commented 1 month ago

Will come back to this later after further investigation.

Thank you.