Closed pvnieo closed 2 years ago
I sometimes get similar issues when I build on a desktop that goes into sleep mode while I am compiling. I think reinstalling the project git and trying again while being careful so that the computer does not go into sleep mode might be effective. Also, since everything before failure has been built and saved in the cache, using the same command will probably work faster.
Hi,
Thank you for your response. Actually, I'm using a server, so there is no risk of the desktop going into sleep mode.
I cloned again the repo and executed the same command, but I'm still getting the same error (the processing was faster than before, maybe because of cache).
#24 22.74 [1304/6115] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/qs8-vadd/gen/minmax-sse41-mul32-ld32-x16.c.o
#24 22.76 [1305/6115] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/qs8-vadd/gen/minmax-sse41-mul32-ld32-x8.c.o
#24 22.76 [1306/6115] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/qs8-vadd/gen/minmax-sse41-mul32-ld32-x24.c.o
#24 22.76 [1307/6115] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/qs8-vadd/gen/minmax-sse41-mul32-ld32-x32.c.o
#24 23.59 [1308/6115] Generating src/x86_64-fma/2d-fourier-8x8.py.o
#24 23.59 ninja: build stopped: subcommand failed.
------
executor failed running [/bin/sh -c USE_CUDA=1 USE_CUDNN=1 TORCH_NVCC_FLAGS=${TORCH_NVCC_FLAGS} TORCH_CUDA_ARCH_LIST=${TORCH_CUDA_ARCH_LIST} CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" python setup.py bdist_wheel -d /tmp/dist]: exit code: 1
make: *** [Makefile:104: build-torch-full] Error 1
What should I do?
As for myself, I just kept repeating the command until the build finished. I know that this is not a great solution, but I haven't faced any problems yet. This issue has not occurred to me on my server yet and it appears to be an issue with PyTorch itself. I'm sorry, but I don't think that I can help more with this.
From the forum discussion, I think that setting the USE_ROCM=0
flag explicitly during build may be helpful. Please try it out and see if it works.
Also, I have checked the build results and have confirmed that ld
points to /usr/bin/ld
as mentioned in pytorch/pytorch#32694. Setting this to /opt/conda/compiler_compat/ld
may work.
I've added the ROCM flag. Please try it out.
Hi,
Thanks for your help.
Actually, I tried to keep repeating the command, and also cloned the repo with the ROM change, but It still failed!
#24 22.30 [1833/6115] Generating src/x86_64-fma/2d-fourier-8x8.py.o
#24 22.30 ninja: build stopped: subcommand failed.
------
executor failed running [/bin/sh -c USE_CUDA=1 USE_CUDNN=1 USE_ROCM=0 TORCH_NVCC_FLAGS=${TORCH_NVCC_FLAGS} TORCH_CUDA_ARCH_LIST=${TORCH_CUDA_ARCH_LIST} CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" python setup.py bdist_wheel -d /tmp/dist]: exit code: 1
make: *** [Makefile:104: build-torch-full] Error 1
Actually, I tried the same build on another server that has a different architecture, and the build compiled successfully!
Still, I don't know what is the problem on the other server!
Try adding RUN ln -sf /opt/conda/compiler_compat/ld /usr/bin/ld
before build begins.
I've placed the conda directory on the end of the PATH variable for the build image and it worked on my laptop without any hiccups. I do not know if this does solve the problem but I will close this issue for now. Please reopen it if it still occurs.
Hi,
I cloned the github version on the main branch, and then executed the following command
make all-full CC="8.0" TRAIN_NAME=train_cu102
, but I got the following error:Is this normal? what should I do? Thank you in advance!