Open zhangs-a-n opened 2 weeks ago
我在conda虚拟环境中安装apex。 我使用的命令是: I installed apex in the conda virtual environment. The command I used was:
pip install --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
虚拟环境使用的是pytorch2.3.1,cuda_version:12.1。 The virtual environment is pytorch2.3.1, cuda_version:12.1. 然后使用的系统是Ubuntu22.04LTS。 The system used is Ubuntu22.04LTS.
安装apex时,如果指定了--config-settings "--build-option=--cpp_ext"
和--config-settings "--build-option=--cuda_ext"
,就需要安装gcc和对应虚拟环境cuda版本的cudatoolkit。cudatoolkit是安装在系统上的,不是安装在虚拟环境中。
When installing apex, if you specify --config-settings "--build-option=--cpp_ext"
and --config-settings "--build-option=--cuda_ext"
, You need to install gcc and the corresponding virtual environment cuda version of cudatoolkit. cudatoolkit is installed on the system, not in a virtual environment.
关于cudatoolkit的安装,https://developer.nvidia.com/cuda-toolkit-archive, 一定要安装与虚拟环境cuda版本对应的cudatoolkit
。
Installation of cudatoolkit https://developer.nvidia.com/cuda-toolkit-archive, virtual environment cuda version must be installed with the corresponding cudatoolkit
.
下面是安装的cudatoolkit版本与虚拟环境中cuda版本不一致时会报的错误:
Here are the errors that will occur when the version of cudatoolkit
installed does not match the version of cuda in the virtual environment
:
- [RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 11.3.
In some cases, a minor-version mismatch will not cause later errors: https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. You can try commenting out this check (at your own risk).
我执行的是下面这条命令: I executed the following command:
第一次运行时,显示失败了,失败的原因是:
其实这次还是有进步的,之前运行的那些乱七八糟的
pip install
命令,有的报TypeError: str
,有的报No module Named torch
(可我明明已经安装了pytorch了啊)。 However, this time there is an improvement. The previouspip install
command was causing a mess ofTypeError
andNo module Named torch
(even though I already have pytorch installed).第二遍时,我嫌显示的信息太多,就把-v项去了,然后等了好几分钟(10 mins?),就显示成功了,真是太扯了。 The second time, I thought there was too much information to display, so I removed the
-v
item, and then I waited a few minutes(10 mins?), and the display was successful.