NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k stars 1.4k forks source link

Setting up Apex and get this error: ModuleNotFoundError: No module named 'torch' #1823

Closed Mayolov closed 3 months ago

Mayolov commented 3 months ago

Describe the Bug

Followed this set up to be able to train the model in this git repo https://github.com/implus/mae_segmentation .

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

This is how I set up my conda enviroment: conda create -n mmseg python=3.8 conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch pip install mmcv-full==1.3.0 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.6/index.html pip install scipy timm==0.3.2 conda install -c nvidia cuda-nvcc #problem arose before and after installing this pip install mmsegmentation==0.11.0

When I run this command when i cd into apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

I get this output:

[filler apex]$ pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ Using pip 24.0 from /vast/home/mayolo/.conda/envs/mmseg/lib/python3.8/site-packages/pip (python 3.8) DEPRECATION: --build-option and --global-option are deprecated. pip 24.2 will enforce this behaviour change. A possible replacement is to use --config-settings. Discussion can be found at https://github.com/pypa/pip/issues/11859 WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option. Processing /vast/home/mayolo/apex Running command pip subprocess to install build dependencies Collecting setuptools Using cached setuptools-72.1.0-py3-none-any.whl Collecting wheel Using cached wheel-0.43.0-py3-none-any.whl Installing collected packages: wheel, setuptools Successfully installed setuptools-72.1.0 wheel-0.43.0 Installing build dependencies ... done Running command Getting requirements to build wheel Traceback (most recent call last): File "/vast/home/mayolo/.conda/envs/mmseg/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/vast/home/mayolo/.conda/envs/mmseg/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/vast/home/mayolo/.conda/envs/mmseg/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) File "/ram/tmp/pip-build-env-5ganjv56/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 327, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=[]) File "/ram/tmp/pip-build-env-5ganjv56/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 297, in _get_build_requires self.run_setup() File "/ram/tmp/pip-build-env-5ganjv56/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 313, in run_setup exec(code, locals()) File "", line 10, in ModuleNotFoundError: No module named 'torch' error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: /vast/home/mayolo/.conda/envs/mmseg/bin/python /vast/home/mayolo/.conda/envs/mmseg/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpywx7_ti5 cwd: /vast/home/mayolo/apex Getting requirements to build wheel ... error error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Coding-Zuo commented 3 months ago

+1

Coding-Zuo commented 3 months ago

I have the same issue. Thanks for posting and could you let me know if you got a solution yet?

Mayolov commented 3 months ago

I have the same issue. Thanks for posting and could you let me know if you got a solution yet?

Hey this command worked for me python3 setup.py install . I still have yet to start training, but my program was able to read Apex as a module finally.

JianboTang commented 3 months ago

I encounter this problem, too~

Bender-L commented 3 months ago

I encounter this problem, too~

Hello, have you solved this problem?

tengjn commented 3 months ago

+1

Mayolov commented 3 months ago

Heres a couple posts that I followed. I specifically remember downloading the non master branch and following these steps to get it to start working. Try either https://github.com/NVIDIA/apex/issues/1737#issuecomment-1762662648 https://github.com/NVIDIA/apex/issues/1594#issuecomment-1822218819

AleenahK commented 3 months ago

I am facing the same issue. Please help!

mirrorboat commented 1 month ago

https://github.com/NVIDIA/apex/issues/1748#issuecomment-1928910265 This may help

khwengXU commented 3 weeks ago

I add "sudo" in my command, and it works.