Docker build failing - Githubissues

francescosarno commented 1 year ago

I have a problem while running docker build . I get the following error:

Dockerfile:59
--------------------
  57 |     RUN SHA=ToUcHMe git clone https://github.com/NVIDIA/apex.git
  58 |     WORKDIR /tmp/unique_for_apex/apex
  59 | >>> RUN /opt/miniconda3/envs/py37/bin/pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
  60 |     #install pytorch3d 
  61 |     # RUN /opt/miniconda3/envs/py37/bin/pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py37_cu102_pyt171/download.html
--------------------
ERROR: failed to solve: process "/bin/sh -c /opt/miniconda3/envs/py37/bin/pip3 install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ." did not complete successfully: exit code: 1

do you know how to solve this? It seems that this is due to apex.

DecaYale commented 1 year ago

Have you solved this? We tested this part, no issue had occurred. Our code can also run with the torch's distributed data parallel without apex. Maybe you just need to modify the code a little bit.

Kaladin-Syl-WR commented 1 year ago

Hi. I seem to be getting the same issue. Any idea what the problem might be and what I can do to fix it?

DecaYale commented 1 year ago

This might be caused by the update of apex repo. I suggest comment this step and try to install apex manually later. Or just use torch's distributed data parallel to replace the usage of apex. If you are doing an evaluation, you can also just run on a single GPU without the need for apex.
I hope this could help.

mqtjean commented 1 year ago

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

brian2lee commented 1 year ago

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

I've been facing the same problem, you've mentioned changing from apex import amp to from torch.cuda import amp, what file did you change cuz I can't find the line in the dockerfile. Sorry if this is a stupid question since I'm quite noob.

Nishanth21D commented 3 weeks ago

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

hey, I did as you mentioned, but it is failing with the below error message "module 'torch.cuda.amp' has no attribute 'float_function'".

Traceback (most recent call last): File "/home/RNNPose/tools/eval.py", line 26, in from builder import ( File "/home/RNNPose/builder/rnnpose_builder.py", line 1, in from builder import losses_builder File "/home/RNNPose/builder/losses_builder.py", line 2, in from model import losses File "/home/RNNPose/model/losses.py", line 22, in class Loss(nn.Module): File "/home/RNNPose/model/losses.py", line 65, in Loss @amp.float_function AttributeError: module 'torch.cuda.amp' has no attribute 'float_function'

any workaround or can I comment it? Thanks in advance

mqtjean commented 3 weeks ago

Hello,

I'm sorry but I don't work on this project anymore and I don't have access to this code now ..

I hope you will find the solution

Good luck in your project

Jean

Le jeu. 15 août 2024 à 12:06, Nishanth D @.***> a écrit :

Thank you for your answer, it seems to work fine for me commenting this command line then start my container and git clone, pip install apex. Indeed, apex.amp is deprecated I had to change from apex import amp to from torch.cuda import amp

hey, I did as you mentioned, but it is failing with the below error message "module 'torch.cuda.amp' has no attribute 'float_function'".

Traceback (most recent call last): File "/home/RNNPose/tools/eval.py", line 26, in from builder import ( File "/home/RNNPose/builder/rnnpose_builder.py", line 1, in from builder import losses_builder File "/home/RNNPose/builder/losses_builder.py", line 2, in from model import losses File "/home/RNNPose/model/losses.py", line 22, in class Loss(nn.Module): File "/home/RNNPose/model/losses.py", line 65, in Loss @amp https://github.com/amp.float_function AttributeError: module 'torch.cuda.amp' has no attribute 'float_function'

any workaround or can I comment it? Thanks in advance

— Reply to this email directly, view it on GitHub https://github.com/DecaYale/RNNPose/issues/20#issuecomment-2291011156, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY6JIF2PKE5YCZIL3YT5XBDZRR4RZAVCNFSM6AAAAABMR5XPOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJRGAYTCMJVGY . You are receiving this because you commented.Message ID: @.***>

DecaYale / RNNPose

Docker build failing #20