OpenBMB / CPM-Bee

百亿参数的中英文双语基座大模型
2.68k stars 211 forks source link

CUDAt版本是12.1而不是11.8,如何解决? #35

Open trphoenix opened 1 year ago

trphoenix commented 1 year ago

` RuntimeError: The detected CUDA version (12.1) mismatches the version that was used to compile PyTorch (11.8). Please make sure to use the same CUDA versions.

我的电脑上的cuda版本 nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0 ` 这台电脑上还有其它项目,难道,要我为了这一个项目降版本么, 能否修懒得说那里让这个项目支持12可能性

trphoenix commented 1 year ago

`(cpmbee) aiuser@aiuser-virtual-machine:~/worker/CPM-Bee/src$ pip install bmtrain --no-cache-dir Collecting bmtrain Downloading bmtrain-0.2.2.tar.gz (58 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.7/58.7 kB 131.8 kB/s eta 0:00:00 Preparing metadata (setup.py) ... done Requirement already satisfied: numpy in /home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages (from bmtrain) (1.24.1) Building wheels for collected packages: bmtrain Building wheel for bmtrain (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [100 lines of output] running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-cpython-310 creating build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain creating build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim creating build/lib.linux-x86_64-cpython-310/bmtrain/nccl copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl copying bmtrain/nccl/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl creating build/lib.linux-x86_64-cpython-310/bmtrain/distributed copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed copying bmtrain/distributed/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed creating build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect creating build/lib.linux-x86_64-cpython-310/bmtrain/loss copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss copying bmtrain/loss/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss creating build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark creating build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler running build_ext Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/tmp/pip-install-8x63jkgl/bmtrain_0e1b06e1b3f94924861318fd74027a6e/setup.py", line 74, in setup( File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/init.py", line 107, in setup return distutils.core.setup(**attrs) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command super().run_command(command) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 325, in run self.run_command("build") File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command super().run_command(command) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run self.run_command(cmd_name) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command super().run_command(command) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions _check_cuda_version(compiler_name, compiler_version) File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda)) RuntimeError: The detected CUDA version (12.1) mismatches the version that was used to compile PyTorch (11.8). Please make sure to use the same CUDA versions.

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for bmtrain Running setup.py clean for bmtrain Failed to build bmtrain Installing collected packages: bmtrain Running setup.py install for bmtrain ... error error: subprocess-exited-with-error

× Running setup.py install for bmtrain did not run successfully. │ exit code: 1 ╰─> [115 lines of output] running install /home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated. !!

          ********************************************************************************
          Please avoid running ``setup.py`` directly.
          Instead, use pypa/build, pypa/installer, pypa/build or
          other standards-based tools.

          See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
          ********************************************************************************

  !!
    self.initialize_options()
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-310
  creating build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain
  creating build/lib.linux-x86_64-cpython-310/bmtrain/optim
  copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim
  copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim
  copying bmtrain/optim/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim
  copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim
  creating build/lib.linux-x86_64-cpython-310/bmtrain/nccl
  copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl
  copying bmtrain/nccl/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl
  creating build/lib.linux-x86_64-cpython-310/bmtrain/distributed
  copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed
  copying bmtrain/distributed/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed
  creating build/lib.linux-x86_64-cpython-310/bmtrain/inspect
  copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect
  copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect
  copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect
  copying bmtrain/inspect/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect
  creating build/lib.linux-x86_64-cpython-310/bmtrain/loss
  copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss
  copying bmtrain/loss/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss
  creating build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
  copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
  copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
  copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
  copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
  copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
  copying bmtrain/benchmark/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
  creating build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
  copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
  copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
  copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
  copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
  copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
  copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
  copying bmtrain/lr_scheduler/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
  running build_ext
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-install-8x63jkgl/bmtrain_0e1b06e1b3f94924861318fd74027a6e/setup.py", line 74, in <module>
      setup(
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/__init__.py", line 107, in setup
      return distutils.core.setup(**attrs)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
      super().run_command(command)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/command/install.py", line 74, in run
      return orig.install.run(self)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/command/install.py", line 697, in run
      self.run_command('build')
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
      super().run_command(command)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
      super().run_command(command)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run
      _build_ext.run(self)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
      _check_cuda_version(compiler_name, compiler_version)
    File "/home/aiuser/anaconda3/envs/cpmbee/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
      raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
  RuntimeError:
  The detected CUDA version (12.1) mismatches the version that was used to compile
  PyTorch (11.8). Please make sure to use the same CUDA versions.

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

× Encountered error while trying to install package. ╰─> bmtrain

note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure. (cpmbee) aiuser@aiuser-virtual-machine:~/worker/CPM-Bee/src$ `

trphoenix commented 1 year ago

我的显卡 (cpmbee) aiuser@aiuser-virtual-machine:~/worker/CPM-Bee/src$ nvidia-smi Thu Jun 1 15:06:45 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA RTX A4000 Off| 00000000:03:00.0 Off | Off | | 38% 58C P0 40W / 140W| 0MiB / 16376MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA RTX A4000 Off| 00000000:13:00.0 Off | Off | | 39% 58C P0 36W / 140W| 0MiB / 16376MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ (cpmbee) aiuser@aiuser-virtual-machine:~/worker/CPM-Bee/src$

zh-zheng commented 1 year ago

BMTrain适配CUDA 12的工作正在进行

zhebuduiba commented 1 year ago

试试这个,TORCH_CUDA_ARCH_LIST="7.5" pip install bmtrain==0.2.1

acbogeh commented 1 year ago

试试这个,TORCH_CUDA_ARCH_LIST="7.5" pip install bmtrain==0.2.1

      raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
  RuntimeError:
  The detected CUDA version (12.1) mismatches the version that was used to compile
  PyTorch (11.7). Please make sure to use the same CUDA versions.
xgsong commented 1 year ago

试试这个,TORCH_CUDA_ARCH_LIST="7.5" pip install bmtrain==0.2.1

没用,一样报错,bmtrain与cuda 12不兼容。

zhaofeng45 commented 1 year ago

放弃了