facebookresearch / metaseq

Repo for external large-scale work
MIT License
6.52k stars 726 forks source link

Setting Up Apex on a Sagemaker instance? #82

Closed Leli1024 closed 2 years ago

Leli1024 commented 2 years ago

🐛 Bug

Whenever the installation instructions are performed on an Amazon SageMaker instance (as well as my local machine, a Macbook M1), the following error will occur when Apex is being installed (even if lines 101-107 are commented out)

In file included from csrc/multi_tensor_adagrad.cu:3: /home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory 6 | #include <cusparse.h> | ^~~~~~~~~~~~ compilation terminated. error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1 error ERROR: Command errored out with exit status 1: /home/studio-lab-user/.conda/envs/studiolab/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-s0m8_6cj/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-s0m8_6cj/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext --deprecated_fused_adam --xentropy --fast_multihead_attn install --record /tmp/pip-record-dip5e6r_/install-record.txt --single-version-externally-managed --compile --install-headers /home/studio-lab-user/.conda/envs/studiolab/include/python3.9/apex Check the logs for full command output.

To Reproduce

The installations have been followed exactly up till this error

suchenzang commented 2 years ago

If this is apex specific, could you open an issue in https://github.com/NVIDIA/apex/issues instead?