aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.38k stars 650 forks source link

CUDA driver error: a PTX JIT compilation failed #195

Closed Yuuuuuuuuuuuuuuuuuummy closed 4 years ago

Yuuuuuuuuuuuuuuuuuummy commented 4 years ago

Hi, when I run condinst, I meet the question as follow, -- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/l547/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, args) File "/data/zt/detectron2/detectron2/engine/launch.py", line 94, in _distributed_worker main_func(args) File "/data/zt/AdelaiDet/tools/train_net.py", line 231, in main return trainer.train() File "/data/zt/AdelaiDet/tools/train_net.py", line 113, in train self.train_loop(self.start_iter, self.max_iter) File "/data/zt/AdelaiDet/tools/train_net.py", line 102, in train_loop self.run_step() File "/data/zt/detectron2/detectron2/engine/train_loop.py", line 227, in run_step loss_dict = self.model(data) File "/home/l547/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "/home/l547/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 511, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/l547/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/data/zt/AdelaiDet/adet/modeling/one_stage_detector.py", line 46, in forward return super().forward(batched_inputs) File "/data/zt/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 285, in forward proposals, proposal_losses = self.proposal_generator(images, features, gt_instances) File "/home/l547/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/data/zt/AdelaiDet/adet/modeling/fcos/fcos.py", line 90, in forward results, losses = self.fcos_outputs.losses( File "/data/zt/AdelaiDet/adet/modeling/fcos/fcos_outputs.py", line 313, in losses return self.fcos_losses(instances) File "/data/zt/AdelaiDet/adet/modeling/fcos/fcos_outputs.py", line 331, in fcos_losses class_loss = sigmoid_focal_loss_jit( RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: CUDA driver error: a PTX JIT compilation failed how to address,please?

tianzhi0549 commented 4 years ago

@Yuuuuuuuuuuuuuuuuuummy this issue should be related to the environments you are using, for example, the pytorch version, cuda and so on. Please check them. If you still cannot solve it, please try the docker image we provided.

Yuuuuuuuuuuuuuuuuuummy commented 4 years ago

@tianzhi0549 thx! I will try it!