KaihuaTang / Scene-Graph-Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
MIT License
1.03k stars 228 forks source link

CUDA error: device-side assert triggered #121

Closed XuMengyaAmy closed 3 years ago

XuMengyaAmy commented 3 years ago

❓ Questions and Help

Hi, Thanks for your work. I use the VG100K dataset and your code. I am using the command of Training Example 2 : (SGCls, Causal, TDE, SUM Fusion, MOTIFS Model) to run the code.

For me, it is okay to run the "validation before training" part. SGG eval: A @ 20: 0.0000; A @ 50: 0.0000; A @ 100: 0.0000; for mode=sgcls, type=TopK Accuracy.

However, I met an error in training. File "./maskrcnn_benchmark/modeling/rpn/loss.py", line 106, in call sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels) File "./maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py", line 53, in call neg_idx_per_image = negative[perm2] RuntimeError: CUDA error: device-side assert triggered index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

I guess this probably means that your class labels are larger than the number of outputs from the model. But after checking, it seems fine.

I print the negative and perm2 and found that the elements inside the perm2 which is used as index are too large. Do you have this kind of error before? Hope to get your reply soom.

Thanks, Mengya

XuMengyaAmy commented 3 years ago

Solved.

hflsdupont commented 3 years ago

Hey! I met the same problem using the default parameters. How did you solve this issue?

Olafyii commented 3 years ago

I met the same issue, solved by updating pytorch version according to https://github.com/pytorch/pytorch/pull/55292

luckyyy00 commented 2 years ago

hi Xu,How did you solve this issue?

zbw4034 commented 1 year ago

I met this problem as well. It seems it's a common problem derived from mask rcnn benchmark, anyone who met this problem can further check related issues in mask rcnn benchmark. By the way, I found Xu's blog: [https://blog.csdn.net/weixin_43332432/article/details/115493442] He recorded: "Remove the “device=negative.device” part in _maskrcnn_benchmark/modeling/balanced_positive_negativesampler.py can slove the issue", which works for me.