NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.2k stars 1.36k forks source link

Fail to install apex: TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' #1697

Closed yu1679959321 closed 1 year ago

yu1679959321 commented 1 year ago

Describe the Bug I'm trying install the apex from source on Windows11: pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" .

and I encounter error:

    Traceback (most recent call last):
    File "E:\Learn\Python\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
      main()
    File "E:\Learn\Python\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "E:\Learn\Python\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 149, in prepare_metadata_for_build_wheel
      return hook(metadata_directory, config_settings)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "E:\Learn\Python\Lib\site-packages\setuptools\build_meta.py", line 377, in prepare_metadata_for_build_wheel
      self.run_setup()
    File "E:\Learn\Python\Lib\site-packages\setuptools\build_meta.py", line 335, in run_setup
      exec(code, locals())
    File "<string>", line 136, in <module>
    File "<string>", line 23, in get_cuda_bare_metal_version
  TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

There is another methond according to README.md pip install -v --no-cache-dir . and this will encounter: ModuleNotFoundError: No module named 'packaging' But I actually installed 'packaging':

>pip3 install packaging
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: packaging in e:\learn\python\lib\site-packages (23.1)

Here is my enviroment:

PyTorch version: 2.1.0.dev20230718+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU
Nvidia driver version: 536.40
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3201
DeviceID=CPU0
Family=107
L2CacheSize=4096
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=3201
Name=AMD Ryzen 7 7735H with Radeon Graphics
ProcessorType=3
Revision=17409

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] numpy==1.24.1
[pip3] torch==2.1.0.dev20230718+cu121
[pip3] torchaudio==2.1.0.dev20230718+cu121
[pip3] torchvision==0.16.0.dev20230718+cu121
[conda] Could not collect
yu1679959321 commented 1 year ago

Hey guys, I made some progress, I found the issue is caused by the lack of cuda toolkit. Follow the steps under "Pip Wheels" can solve it.

Now I have to Downgrade my cuda from 12.2 to 12.1

yu1679959321 commented 1 year ago

Fortunately, only need downgrade cuda toolkit can make it work.