NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.33k stars 1.39k forks source link

can't install apex #1594

Open haiqizhang opened 1 year ago

haiqizhang commented 1 year ago
 Running command python setup.py egg_info
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/home/glm/apex/setup.py", line 4, in <module>
      from packaging.version import parse, Version
  ModuleNotFoundError: No module named 'packaging'
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/letrain/miniconda/envs/glm/bin/python -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/home/glm/apex/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-pijoox9c
  cwd: /home/glm/apex/
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

my env: Pytorch==1.7.0 python==3.8.15 CUDA==11.0

fazlicodes commented 1 year ago

I'm getting the same issue

pkosel commented 1 year ago

Same

crcrpar commented 1 year ago

File "/home/glm/apex/setup.py", line 4, in from packaging.version import parse, Version ModuleNotFoundError: No module named 'packaging'

could you install packaging and retry?

pkosel commented 1 year ago

That solved it for me.

fazlicodes commented 1 year ago

Doesn't work for me, packaging already installed

honeysuckcle commented 1 year ago

File "/home/glm/apex/setup.py", line 4, in from packaging.version import parse, Version ModuleNotFoundError: No module named 'packaging'

could you install packaging and retry?

It works. Thank you!

zw-xxx commented 1 year ago

Doesn't work for me, packaging already installed

I came to the same situation. I have solved it by installing the packaging by conda install packaging rather than pip install packaging.

And APEX gets installed successfully for me.

DjokerR commented 1 year ago

cuda 11.7 python3.7.6 torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1

unzip apex.zip cd apex

1、install no cuda python setup.py install

2、 install with cuda python setup.py install --cuda_ext The compilation process is very long, at least ten minutes, and the CPU and memory footprint is 8c 8G

zp2459 commented 1 year ago

same problem, Using pip 22.3.1 from /home/panz/anaconda3/envs/gpt/lib/python3.8/site-packages/pip (python 3.8) WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option / --install-option. Consider using --config-settings for more flexibility. DEPRECATION: --no-binary currently disables reading from the cache of locally built wheels. In the future --no-binary will not influence the wheel cache. pip 23.1 will enforce this behaviour change. A possible replacement is to use the --no-cache-dir option. You can use the flag --use-feature=no-binary-enable-wheel-cache to test the upcoming behaviour. Discussion can be found at https://github.com/pypa/pip/issues/11453 Processing /home/panz/project/ColossalAI/examples/language/gpt/gemini/apex Running command python setup.py egg_info

torch.version = 1.12.0+cu113

running egg_info creating /tmp/pip-pip-egg-info-7w8pc_qc/apex.egg-info writing /tmp/pip-pip-egg-info-7w8pc_qc/apex.egg-info/PKG-INFO writing dependency_links to /tmp/pip-pip-egg-info-7w8pc_qc/apex.egg-info/dependency_links.txt writing requirements to /tmp/pip-pip-egg-info-7w8pc_qc/apex.egg-info/requires.txt writing top-level names to /tmp/pip-pip-egg-info-7w8pc_qc/apex.egg-info/top_level.txt writing manifest file '/tmp/pip-pip-egg-info-7w8pc_qc/apex.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-pip-egg-info-7w8pc_qc/apex.egg-info/SOURCES.txt' adding license file 'LICENSE' writing manifest file '/tmp/pip-pip-egg-info-7w8pc_qc/apex.egg-info/SOURCES.txt' Preparing metadata (setup.py) ... done Requirement already satisfied: packaging>20.6 in /home/panz/anaconda3/envs/gpt/lib/python3.8/site-packages (from apex==0.1) (23.0) Installing collected packages: apex DEPRECATION: apex is being installed using the legacy 'setup.py install' method, because the '--no-binary' option was enabled for it and this currently disables local wheel building for projects that don't have a 'pyproject.toml' file. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/11451 Running command Running setup.py install for apex

torch.version = 1.12.0+cu113

Compiling cuda extensions with nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Feb__7_19:32:13_PST_2023 Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0 from /home/panz/anaconda3/envs/gpt/bin

Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/home/panz/project/ColossalAI/examples/language/gpt/gemini/apex/setup.py", line 171, in check_cuda_torch_binary_vs_bare_metal(CUDA_HOME) File "/home/panz/project/ColossalAI/examples/language/gpt/gemini/apex/setup.py", line 33, in check_cuda_torch_binary_vs_bare_metal raise RuntimeError( RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 11.3. In some cases, a minor-version mismatch will not cause later errors: https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. You can try commenting out this check (at your own risk). error: subprocess-exited-with-error

× Running setup.py install for apex did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: /home/panz/anaconda3/envs/gpt/bin/python -u -c ' exec(compile('"'"''"'"''"'"'

This is -- a caller that pip uses to run setup.py

#

- It imports setuptools before invoking setup.py, to enable projects that directly

import from distutils.core to work with newer packaging standards.

- It provides a clear error message when setuptools is not installed.

- It sets sys.argv[0] to the underlying setup.py, when invoking setup.py so

setuptools doesn'"'"'t think the script is -c. This avoids the following warning:

manifest_maker: standard file '"'"'-c'"'"' not found".

- It generates a shim setup.py, for handling setup.cfg-only projects.

import os, sys, tokenize

try: import setuptools except ImportError as error: print( "ERROR: Can not execute setup.py since setuptools is not available in " "the build environment.", file=sys.stderr, ) sys.exit(1)

file = %r sys.argv[0] = file

if os.path.exists(file): filename = file with tokenize.open(file) as f: setup_py_code = f.read() else: filename = "" setup_py_code = "from setuptools import setup; setup()"

exec(compile(setup_py_code, filename, "exec")) '"'"''"'"''"'"' % ('"'"'/home/panz/project/ColossalAI/examples/language/gpt/gemini/apex/setup.py'"'"',), "", "exec"))' --cpp_ext --cuda_ext install --record /tmp/pip-record-ss3s_tnl/install-record.txt --single-version-externally-managed --compile --install-headers /home/panz/anaconda3/envs/gpt/include/python3.8/apex cwd: /home/panz/project/ColossalAI/examples/language/gpt/gemini/apex/ Running setup.py install for apex ... error error: legacy-install-failure

× Encountered error while trying to install package. ╰─> apex

note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

Linghuxc commented 1 year ago

thank you very much

SeekPoint commented 1 year ago

conda install packaging

not works, the bug still exist

LambdaGuard commented 1 year ago

conda install packaging not works I tried to 'git checkout' a older version and apex was installed sucessfully

the-yanqi commented 1 year ago

conda install packaging not works I tried to 'git checkout' a older version and apex was installed sucessfully

Which older version did you use?

LambdaGuard commented 1 year ago

conda install packaging not works I tried to 'git checkout' a older version and apex was installed sucessfully

Which older version did you use?

I checkout the commit "6943fd26e04c59327de32592cf5af68be8f5c44e" and it works (actually chosen randomly an older one)

zarzen commented 1 year ago

Using the release 23.05 works for me (https://github.com/NVIDIA/apex/tags) Hash tag: 0da3ffb

karol-nowakowski commented 1 year ago

Using the release 23.05 works for me (https://github.com/NVIDIA/apex/tags) Hash tag: 0da3ffb

This release worked for me too, thanks @zarzen !

My environment: Pytorch==2.1.0.dev20230725+cu121 Python==3.10.6 CUDA==12.1

shjwudp commented 1 year ago

try add --no-build-isolation flag in pip install.

lixin4ever commented 1 year ago

try add --no-build-isolation flag in pip install.

After trying all the solutions above (but the issue still kept unresolved), this one eventually worked for me :thumbsup:

wangbxj1234 commented 1 year ago

try add --no-build-isolation flag in pip install.

thx

aakejiang commented 1 year ago

try add --no-build-isolation flag in pip install.

It works for me. The full command is: pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

ahengg commented 10 months ago

try add --no-build-isolation flag in pip install.

Thankyou it's worked

Fire-Star commented 10 months ago

6943fd26e04c59327de32592cf5af68be8f5c44e

I chosed 22.3 in tags,successed

mikenetrino commented 10 months ago

pip uninstall setuptools

pip install setuptools==60.2.0

pip install packaging

rm -R apex

git clone https://github.com/NVIDIA/apex

cd apex

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

~/It will work :)

VinceChin commented 10 months ago

Having tried all of the above, but still not working for me....

JuicyJeong commented 10 months ago

pip uninstall setuptools

pip install setuptools==60.2.0

pip install packaging

rm -R apex

git clone https://github.com/NVIDIA/apex

cd apex

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

~/It will work :)

it works! thanks mikenetrino :)

rocke2020 commented 10 months ago

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

pip uninstall setuptools pip install setuptools==60.2.0 pip install packaging rm -R apex git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ ~/It will work :)

it works! thanks mikenetrino :) @JuicyJeong could you share your env by runing, such as the version of python, cuda, pytorch, etc. thanks!! python -m torch.utils.collect_env

h2222 commented 9 months ago

--no-build-isolation

hero

CrackerHax commented 9 months ago

try add --no-build-isolation flag in pip install.

Worked for me

tbbatbb commented 9 months ago

Using the release 23.05 works for me (https://github.com/NVIDIA/apex/tags) Hash tag: 0da3ffb

Worked for me. Thank you my hero.

wuusn commented 9 months ago

the full script is this:

wget https://github.com/NVIDIA/apex/archive/refs/tags/23.05.zip
unzip 23.05.zip
cd apex-23.05/
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
Huanxi2018 commented 8 months ago

try add --no-build-isolation flag in pip install.

After trying all the solutions above (but the issue still kept unresolved), this one eventually worked for me 👍

It works. Thanks a lot.

MikeBrock03 commented 6 months ago

pip uninstall setuptools

pip install setuptools==60.2.0

pip install packaging

rm -R apex

git clone https://github.com/NVIDIA/apex

cd apex

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

~/It will work :)

it did! ♥️

kennedyCzar commented 6 months ago

Try, pip install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ It solved mine.

FunnyRainn commented 6 months ago

try add --no-build-isolation flag in pip install.

cooooooooooooollll!!! it works! thanls!!

FunnyRainn commented 6 months ago

try add --no-build-isolation flag in pip install.

cooooooooooooollll!!! it works! thanls!!

i use this :pip install -v --no-cache-dir . --no-build-isolation

kelvin0207 commented 5 months ago

Using the release 23.05 works for me (https://github.com/NVIDIA/apex/tags) Hash tag: 0da3ffb

This works for me! Thank you so much

miko8422 commented 4 months ago

0da3ffb

Thanks, I solved it by change to this version, but there are some weird warning while I was installing, but eventually installed, if I meet some problem after, I'll leave a comment here.

EvelynXZY commented 4 months ago

Got the same error.

try add --no-build-isolation flag in pip install.

combined with downgrading setuptools from version 70 works! pip uninstall -y setuptools pip install setuptools==69.5.1

miko8422 commented 4 months ago

Got the same error.

try add --no-build-isolation flag in pip install.

combined with downgrading setuptools from version 70 works! pip uninstall -y setuptools pip install setuptools==69.5.1

I'm just kind of give up on apex, it even don't have a stable version, so. I may play with it later, I just want the job done, haha. Thanks.

mahmoodn commented 3 months ago

Following... have the same issue.

Sameer-13 commented 3 months ago

I used to get this error: ModuleNotFoundError: No module named 'packaging'. I tried most of the solutions mentioned before, however, now I face a new error: raise child_exception_type(errno_num, err_msg, err_filename)

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda-10.0/bin/nvcc': '/usr/local/cuda-10.0/bin/nvcc' error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

How I could solve it?

chenrongboo commented 2 months ago

try add --no-build-isolation flag in pip install.

Thanks!

mariem-m11 commented 1 month ago

try add --no-build-isolation flag in pip install.

Thank you