error trying to train on my custom dataset with multiple classes

bat3a commented 4 years ago

hi i get an error trying to train on my custom dataset with multiple classes: ad: [47,0,0] Assertioncur_target >= 0 && cur_target < n_classesfailed. Traceback (most recent call last): File "train.py", line 504, in <module> train() File "train.py", line 307, in train losses = net(datum) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "train.py", line 146, in forward losses = self.criterion(self.net, preds, targets, masks, num_crowds) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/me/Desktop/deep/suhaila/yolact/layers/modules/multibox_loss.py", line 194, in forward losses['S'] = self.semantic_segmentation_loss(predictions['segm'], masks, labels) File "/home/me/Desktop/deep/suhaila/yolact/layers/modules/multibox_loss.py", line 235, in semantic_segmentation_loss segment_t[cur_class_t[obj_idx]] = torch.max(segment_t[cur_class_t[obj_idx]], downsampled_masks[obj_idx]) RuntimeError: CUDA error: device-side assert triggered when i change all categories to 1, it runs! when i change back to multiple cats. it gives the error.

my config: `foods_dataset = dataset_base.copy({ 'name': 'foods dataset',

'train_images': '../data/foods_dataset/train',
'train_info':   '../data/foods_dataset/train/annotations_coco.json',

'valid_images': '../data/foods_dataset/val',
'valid_info':   '../data/foods_dataset/val/annotations_coco.json',

'has_gt': True,
'class_names': ('banana','onions','figs','pear','rice','qiwi','plum','lemon','pringles','peach','tomato','potato','Cantaloupe','cucumber','Roasted Chicken Breast','grape','apple','bread','carrot'),

'label_map': { 1:  1,  2:  2,  3:  3,  4:  4,  5:  5,  6:  6,  7:  7,  8:  8,
               9:  9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16,
              17: 17, 18: 18, 19: 19, 20: 20, 21: 21}

})`

bat3a commented 4 years ago

any help would be appreciated!

abhigoku10 commented 4 years ago

@bat3a when u keep it 1 can you check ur gpu usage using nvidia smi since the error is in CUDA

bat3a commented 4 years ago

@bat3a when u keep it 1 can you check ur gpu usage using nvidia smi since the error is in CUDA

id1

abhigoku10 commented 4 years ago

@bat3a y are you getting so many size mismatch warnings , can you pls check with your input config and pretrained model

XunHang commented 4 years ago

Your classes number is 19, so your lable map should be matched. You can change your label map as below: 'label_map': { 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19} and try to train the net again. (I just solve this problem today by this solution.)

bat3a commented 4 years ago

@XunHang thank you, that solved it.

dbolya / yolact

error trying to train on my custom dataset with multiple classes #333