bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

Help me, I'm dying soon,error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error #382

Open listwebit opened 1 year ago

listwebit commented 1 year ago

used the following installation method, but received an error that has not been resolved for several days:

git clone https://github.com/NVIDIA/apex cd apex pip install --global-option="--cpp_ext" --global-option="--cuda_ext" --no-cache -v --disable-pip-version-check . 2>&1 | tee build.log

The environment is as follows:

nvidia-smi:CUDA Version: 10.2 /usr/local/cuda/bin/nvcc -V :Cuda compilation tools, release 10.2, V10.2.89

pip --default-timeout=10000 install torch==1.12.0+cu102 torchvision==0.13.0+cu102 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu102

I also upgraded gcc: yum install centos-release-scl yum install devtoolset-7*

激活对应的devtoolset,所以你可以一次安装多个版本的devtoolset,需要的时候用下面这条命令切换到对应的版本

scl enable devtoolset-8 bash

The error is as follows:

/home/liulei/miniconda3/envs/pretrain/lib/python3.9/site-packages/torch/include/ATen/core/function_schema.h:522:20: note: ‘c10::toString’ inline std::string toString(const FunctionSchema& schema) { ^~~~ /home/liulei/miniconda3/envs/pretrain/lib/python3.9/site-packages/torch/include/ATen/core/function_schema.h:522:20: note: ‘c10::toString’ /home/liulei/miniconda3/envs/pretrain/lib/python3.9/site-packages/torch/include/ATen/core/function_schema.h:522:20: note: ‘c10::toString’ error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error

× Running setup.py install for apex did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: /home/liulei/miniconda3/envs/pretrain/bin/python -u -c ' exec(compile('"'"''"'"''"'"'

mport os, sys, tokenize

try: import setuptools except ImportError as error: print( "ERROR: Can not execute setup.py since setuptools is not available in " "the build environment.", file=sys.stderr, ) sys.exit(1)

file = %r sys.argv[0] = file

if os.path.exists(file): filename = file with tokenize.open(file) as f: setup_py_code = f.read() else: filename = "" setup_py_code = "from setuptools import setup; setup()"

exec(compile(setup_py_code, filename, "exec")) '"'"''"'"''"'"' % ('"'"'/home/liulei/liulei2/apex-master/setup.py'"'"',), "", "exec"))' --cpp_ext --cuda_ext install --record /tmp/pip-record-bugw92kn/install-record.txt --single-version-externally-managed --compile --install-headers /home/liulei/miniconda3/envs/pretrain/include/python3.9/apex cwd: /home/liulei/liulei2/apex-master/ Running setup.py install for apex: finished with status 'error' error: legacy-install-failure

× Encountered error while trying to install package. ╰─> apex

note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.