NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.3k stars 1.38k forks source link

pipeline_parallel - ModuleNotFoundError: No module named 'amp_C' #1204

Open MatthieuCed opened 2 years ago

MatthieuCed commented 2 years ago

Hi,

I was using apex and everything worked well until this week when I get the following error:

---> 39 from apex import amp
---> 20 from . import transformer
----> 3 from apex.transformer import pipeline_parallel
----> 1 from apex.transformer.pipeline_parallel.schedules import get_forward_backward_func
----> 4 from apex.transformer.pipeline_parallel.utils import get_num_microbatches
---> 23 import amp_C

ModuleNotFoundError: No module named 'amp_C'

I saw this error occurs in other scenarios (https://github.com/NVIDIA/apex/issues/865), but not at the same place. Do you know how to solve it (PS: I'm running the code in Google colab). Thanks,

jmusiel commented 2 years ago

I have the same issue in pipeline_parallels, same stack trace and everything

I am on the Python-only build of apex

lunasara commented 2 years ago

I have the same issue. The workaround that worked for me is to use the version prior to the last commit here: https://github.com/NVIDIA/apex/tree/3303b3e7174383312a3468ef390060c26e640cb1

As you can see on the commit history the pipeline_parallels was added on October 27th here: https://github.com/NVIDIA/apex/pull/1202 I don't know what the issue might be but I hope the woraround works for you aswell.

jmusiel commented 2 years ago

I have the same issue. The workaround that worked for me is to use the version prior to the last commit here: https://github.com/NVIDIA/apex/tree/3303b3e7174383312a3468ef390060c26e640cb1

Yes this worked for me as well, thank you!

Haoqing-Wang commented 2 years ago

I have the same issue. The workaround that worked for me is to use the version prior to the last commit here: https://github.com/NVIDIA/apex/tree/3303b3e7174383312a3468ef390060c26e640cb1

As you can see on the commit history the pipeline_parallels was added on October 27th here: #1202 I don't know what the issue might be but I hope the woraround works for you aswell.

How to use the prior version of apex? Are the install commands still below:

git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

lunasara commented 2 years ago

How to use the prior version of apex? Are the install commands still below:

git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

I didn't try it. In my case I simply added the apex directory to the system path using the commands below and it worked for me.

import sys
sys.path.append("/workspace/apex")
crcrpar commented 2 years ago

Sorry for having been bothering you, I wrote a patch #1211. Could some of you try https://github.com/NVIDIA/apex/commit/25bfcb914796c7955956fc1e39839e7efed997d5 and let me know if the issue is solved or not?

yt605155624 commented 2 years ago

Sorry for having been bothering you, I wrote a patch #1211. Could some of you try 25bfcb9 and let me know if the issue is solved or not?

not solved for me

xhwengnthu commented 2 years ago

the problem doesn't seem to be solved yet

eonglints commented 2 years ago

Not solved for me either

PhdShi commented 2 years ago

not solved either

BraveSeeker commented 1 year ago

Not solved either

Dixeran commented 1 year ago

To check if "amp_C" is really installed, add a print statement in setup.py arround the place that adding "amp_C" targets. I've seem sometimes using pip install ... command will NOT trigger these targets, and I have to use command in python setup.py install --cuda_ext like form.

hitsz-qfy commented 1 year ago

To check if "amp_C" is really installed, add a print statement in setup.py arround the place that adding "amp_C" targets. I've seem sometimes using pip install ... command will NOT trigger these targets, and I have to use command in python setup.py install --cuda_ext like form.

Really Awesome!

rocke2020 commented 9 months ago

How to use the prior version of apex? Are the install commands still below:

git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

I didn't try it. In my case I simply added the apex directory to the system path using the commands below and it worked for me.

import sys
sys.path.append("/workspace/apex")

@lunasara, could explain what's "/workspace/apex"? thanks in advance!!

hitsz-qfy commented 9 months ago

您好,     邮件已收到。

BarathwajAnandan commented 1 month ago

To check if "amp_C" is really installed, add a print statement in setup.py arround the place that adding "amp_C" targets. I've seem sometimes using pip install ... command will NOT trigger these targets, and I have to use command in python setup.py install --cuda_ext like form.

This is the solution! After 2 hours of trying different dockers and compilation efforts, all I needed was to call the arguments manually. Thank you! For me the issue was with permutation_search installation

hitsz-qfy commented 1 month ago

您好,     邮件已收到。

Lijuming33 commented 2 weeks ago

cd /path/to/apex python setup.py install --cpp_ext --cuda_ext 已经解决No module named ‘amp_C‘错误

hitsz-qfy commented 2 weeks ago

您好,     邮件已收到。