Open lsj1111 opened 4 months ago
You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Instructions To Reproduce the Issue and Full Logs";
Hi, This issue seems to root from pytorch it self... Check: PyTorch Issue #87595, The issue was initially found in 2022 and now an update has been pushed... The following command should help you get the latest version
pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/test/cu117/torch_test.html If there are any issues till please feel free to comment :)
Hi, This issue seems to root from pytorch it self... Check: PyTorch Issue #87595, The issue was initially found in 2022 and now an update has been pushed... The following command should help you get the latest version
pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/test/cu117/torch_test.html If there are any issues till please feel free to comment :)
year,thank you ,I have solved this problem. In the official documentation of detectron2, it seems that only cuda11.3 is supported, so I used cuda11.3 and caused the above problem, but then I found that cuda11.6 can also use detectron2, so the problem was solved. .
ah ok great :) 👍🏽
the environment is :
sys.platform linux Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] numpy 1.24.3 detectron2 0.6 @/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2 Compiler GCC 7.5 CUDA compiler CUDA 11.3 detectron2 arch flags /media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2/_C.cpython-38-x86_64-linux-gnu.so; cannot find cuobjdump DETECTRON2_ENV_MODULE
PyTorch 1.10.0 @/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0,1 NVIDIA GeForce RTX 4090 (arch=8.9)
Driver version 535.183.01
CUDA_HOME :/usr/local/cuda - invalid!
Pillow 10.3.0
torchvision 0.11.0 @/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torchvision
torchvision arch flags /media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torchvision/_C.so; cannot find cuobjdump
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.10.0
This problem arises when I execute train_net.py:
[07/01 22:30:46] d2.engine.train_loop INFO: Starting training from iteration 0 [07/01 22:30:47] d2.engine.train_loop ERROR: Exception during training: Traceback (most recent call last): File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2/engine/train_loop.py", line 155, in train self.run_step() File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2/engine/defaults.py", line 498, in run_step self._trainer.run_step() File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MASKDINO/detectron2-main/detectron2/engine/train_loop.py", line 495, in run_step loss_dict = self.model(data) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MaskDINO-main/maskdino/maskdino.py", line 267, in forward losses = self.criterion(outputs, targets,mask_dict) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MaskDINO-main/maskdino/modeling/criterion.py", line 357, in forward indices = self.matcher(outputs_without_aux, targets) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, *kwargs) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MaskDINO-main/maskdino/modeling/matcher.py", line 220, in forward return self.memory_efficient_forward(outputs, targets, cost) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/anaconda3/envs/maskdino/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(args, **kwargs) File "/media/hky/d2bed7b9-228d-4a5b-ae76-f3f34ce12c7b/hky/lsj/code/MaskDINO-main/maskdino/modeling/matcher.py", line 169, in memory_efficient_forward cost_mask = batch_sigmoid_ce_loss_jit(out_mask, tgt_mask) RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
define NAN __int_as_float(0x7fffffff)
define POS_INFINITY __int_as_float(0x7f800000)
define NEG_INFINITY __int_as_float(0xff800000)
template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
extern "C" global void fused_neg_add(float ttargets_1, float aten_add) { { float v = __ldg(ttargets_1 + (long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)); aten_add[(long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)] = (0.f - v) + 1.f; } }