Closed JulienFleuret closed 5 years ago
Hmm it'd be helpful if you could get the exact line that the error happens on (inside of the loss forward function), but for now can you send over your dataset and model configs? Maybe there's an issue with those.
Hello
From the file train.py
it happen line 256.
It is after the declaration of the variable criterion at line 163, exactly its happen when the forward
method of the class MultiBoxLoss
is implicitly called.
In my download I put several print
in order to investigate that is why I do not have the same index of line than the origininal file.
I meant the line in multi_box.py
, but it looks like the stack trace isn't printing that line out for you? The only call to max that I can see that would get triggered in multi_box.py
is when computing the semantic segmentation loss--i.e., would have to do with classes.
Can you send over your dataset and model configs so I can check if you set up the classes properly?
I'll try to investigate more deeply the structuratoin of the dataset.
I just realized I misread you last response.
The issue appear in the function match
from the class box_utils.py
which is called in the method forward
of the class Multibox
:
Here the error message:
(Note the prints after begin training are mine)
%run train.py --config=yolact_base_config
loading annotations into memory...
Done (t=4.22s)
creating index...
index created!
loading annotations into memory...
Done (t=4.58s)
creating index...
index created!
Initializing weights...
NB CLASSES: 317
NB IMAGES TO LOAD: 625
Begin training!
<class 'dict'> dict_keys(['loc', 'conf', 'mask', 'priors', 'proto', 'segm']) <class '__main__.ScatterWrapper'> <class 'torch.Tensor'> torch.Size([8])
BEFORE MATCH: 0.5 0.4 torch.Size([0, 4]) torch.Size([19248, 4]) 0
BOX UTILS: torch.Size([0, 19248])
BEFORE MATCH: 0.5 0.4 torch.Size([0, 4]) torch.Size([19248, 4]) 0
BOX UTILS: torch.Size([0, 19248])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/hdd1/prog/yolact/train.py in <module>()
520 tmask = wrapper.make_mask()
521 print(type(out), out.keys(), type(wrapper), type(wrapper.make_mask()), tmask.shape)
--> 522 losses = criterion(out, wrapper, wrapper.make_mask())
523
524 losses = { k: v.mean() for k,v in losses.items() } # Mean here because Dataparallel
~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
~/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
141 return self.module(*inputs[0], **kwargs[0])
142 replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
--> 143 outputs = self.parallel_apply(replicas, inputs, kwargs)
144 return self.gather(outputs, self.output_device)
145
~/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in parallel_apply(self, replicas, inputs, kwargs)
151
152 def parallel_apply(self, replicas, inputs, kwargs):
--> 153 return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
154
155 def gather(self, outputs, output_device):
~/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py in parallel_apply(modules, inputs, kwargs_tup, devices)
81 output = results[i]
82 if isinstance(output, Exception):
---> 83 raise output
84 outputs.append(output)
85 return outputs
~/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py in _worker(i, module, input, kwargs, device)
57 if not isinstance(input, (list, tuple)):
58 input = (input,)
---> 59 output = module(*input, **kwargs)
60 with lock:
61 results[i] = output
~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
/hdd1/prog/yolact/layers/modules/multibox_loss.py in forward(self, predictions, wrapper, wrapper_mask)
138 match(self.pos_threshold, self.neg_threshold,
139 truths, defaults, labels[idx], crowd_boxes,
--> 140 loc_t, conf_t, idx_t, idx, loc_data[idx])
141
142 gt_box_t[idx, :, :] = truths[idx_t[idx]]
/hdd1/prog/yolact/layers/box_utils.py in match(pos_thresh, neg_thresh, truths, priors, labels, crowd_boxes, loc_t, conf_t, idx_t, idx, loc_data)
138
139 # Size [num_objects] best prior for each ground truth
--> 140 best_prior_overlap, best_prior_idx = overlaps.max(1)
141 # Size [num_priors] best ground truth for each prior
142 best_truth_overlap, best_truth_idx = overlaps.max(0)
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity
As visible in my prints I identify that some tensors are empty but I am still investigating the reason why they are empty. It seem to be linked to the JSON files however when I investigate it with pycocotools every thing seem normal in the sens that I did not encounter any error nor issues. I was able to work with the data access the mask and annotations and everything the way I used to do.
Can you show me the dataset and model configs you set up in config.py
? Looking at that line, it seems one of truths
and decoded_priors
is empty. If decoded_priors
is empty that means something with your model config is wrong and the config produced no anchor boxes. If truths
is empty, then that means something with your dataset isn't configured properly (truths
should never be empty because I ignore images without GT).
I have a feeling this is a simple mistake in one of your configs (not the COCO annotations, but the config.py
configs).
The dataset look like:
PASCAL_PART_CLASSES = ('aeroplane_body', 'aeroplane_engine_1',
'aeroplane_engine_2', 'aeroplane_engine_3',
'aeroplane_engine_4', 'aeroplane_engine_5',
'aeroplane_engine_6', 'aeroplane_lwing',
'aeroplane_rwing', 'aeroplane_stern',
'aeroplane_tail', 'aeroplane_wheel_1',
'aeroplane_wheel_2', 'aeroplane_wheel_3',
'aeroplane_wheel_4', 'aeroplane_wheel_5',
'aeroplane_wheel_6', 'aeroplane_wheel_7',
'aeroplane_wheel_8', 'aeroplane_whole',
'bicycle_bwheel', 'bicycle_chainwheel', 'bicycle_fwheel',
'bicycle_handlebar', 'bicycle_headlight_1',
'bicycle_saddle', 'bicycle_whole',
'bird_beak', 'bird_head', 'bird_leye', 'bird_lfoot',
'bird_lleg', 'bird_lwing', 'bird_neck', 'bird_reye',
'bird_rfoot', 'bird_rleg', 'bird_rwing', 'bird_tail',
'bird_torso', 'bird_whole', 'boat_whole', 'bottle_body',
'bottle_cap', 'bottle_whole', 'bus_backside',
'bus_bliplate', 'bus_door_1', 'bus_door_2', 'bus_door_3',
'bus_door_4', 'bus_fliplate', 'bus_frontside',
'bus_headlight_1', 'bus_headlight_2', 'bus_headlight_3',
'bus_headlight_4', 'bus_headlight_5', 'bus_headlight_6',
'bus_headlight_7', 'bus_headlight_8', 'bus_leftmirror',
'bus_leftside', 'bus_rightmirror', 'bus_rightside',
'bus_roofside', 'bus_wheel_1', 'bus_wheel_2',
'bus_wheel_3', 'bus_wheel_4', 'bus_wheel_5', 'bus_whole',
'bus_window_1', 'bus_window_10', 'bus_window_11',
'bus_window_12', 'bus_window_13', 'bus_window_14',
'bus_window_15', 'bus_window_16', 'bus_window_17',
'bus_window_18', 'bus_window_19', 'bus_window_2',
'bus_window_20', 'bus_window_3', 'bus_window_4',
'bus_window_5', 'bus_window_6', 'bus_window_7',
'bus_window_8', 'bus_window_9',
'car_backside', 'car_bliplate', 'car_door_1',
'car_door_2', 'car_door_3', 'car_fliplate',
'car_frontside', 'car_headlight_1', 'car_headlight_2',
'car_headlight_3', 'car_headlight_4', 'car_headlight_5',
'car_headlight_6', 'car_leftmirror', 'car_leftside',
'car_rightmirror', 'car_rightside', 'car_roofside',
'car_wheel_1', 'car_wheel_2', 'car_wheel_3',
'car_wheel_4', 'car_wheel_5', 'car_whole', 'car_window_1',
'car_window_2', 'car_window_3', 'car_window_4',
'car_window_5', 'car_window_6', 'car_window_7',
'cat_head', 'cat_lbleg', 'cat_lbpa', 'cat_lear',
'cat_leye', 'cat_lfleg', 'cat_lfpa', 'cat_neck',
'cat_nose', 'cat_rbleg', 'cat_rbpa', 'cat_rear',
'cat_reye', 'cat_rfleg', 'cat_rfpa', 'cat_tail',
'cat_torso', 'cat_whole', 'chair_whole', 'cow_head',
'cow_lblleg', 'cow_lbuleg', 'cow_lear', 'cow_leye',
'cow_lflleg', 'cow_lfuleg', 'cow_lhorn', 'cow_muzzle',
'cow_neck', 'cow_rblleg', 'cow_rbuleg', 'cow_rear',
'cow_reye', 'cow_rflleg', 'cow_rfuleg', 'cow_rhorn',
'cow_tail', 'cow_torso', 'cow_whole',
'dog_head', 'dog_lbleg', 'dog_lbpa', 'dog_lear',
'dog_leye', 'dog_lfleg', 'dog_lfpa', 'dog_muzzle',
'dog_neck', 'dog_nose', 'dog_rbleg', 'dog_rbpa',
'dog_rear', 'dog_reye', 'dog_rfleg', 'dog_rfpa',
'dog_tail', 'dog_torso', 'dog_whole',
'horse_head', 'horse_lbho', 'horse_lblleg', 'horse_lbuleg',
'horse_lear', 'horse_leye', 'horse_lfho', 'horse_lflleg',
'horse_lfuleg', 'horse_muzzle', 'horse_neck', 'horse_rbho',
'horse_rblleg', 'horse_rbuleg', 'horse_rear', 'horse_reye',
'horse_rfho', 'horse_rflleg', 'horse_rfuleg', 'horse_tail',
'horse_torso', 'horse_whole',
'motorbike_bwheel', 'motorbike_fwheel',
'motorbike_handlebar', 'motorbike_headlight_1',
'motorbike_headlight_2', 'motorbike_headlight_3',
'motorbike_saddle', 'motorbike_whole',
'person_hair', 'person_head', 'person_lear',
'person_lebrow', 'person_leye', 'person_lfoot',
'person_lhand', 'person_llarm', 'person_llleg',
'person_luarm', 'person_luleg', 'person_mouth',
'person_neck', 'person_nose', 'person_rear',
'person_rebrow', 'person_reye', 'person_rfoot',
'person_rhand', 'person_rlarm', 'person_rlleg',
'person_ruarm', 'person_ruleg', 'person_torso',
'person_whole',
'pottedplant_plant', 'pottedplant_pot', 'pottedplant_whole',
'sheep_head', 'sheep_lblleg', 'sheep_lbuleg', 'sheep_lear',
'sheep_leye', 'sheep_lflleg', 'sheep_lfuleg', 'sheep_lhorn',
'sheep_muzzle', 'sheep_neck', 'sheep_rblleg', 'sheep_rbuleg',
'sheep_rear', 'sheep_reye', 'sheep_rflleg', 'sheep_rfuleg',
'sheep_rhorn', 'sheep_tail', 'sheep_torso', 'sheep_whole',
'sofa_whole',
'table_whole',
'train_cbackside_1', 'train_cbackside_2',
'train_cfrontside_1', 'train_cfrontside_2',
'train_cfrontside_3', 'train_cfrontside_4',
'train_cfrontside_5', 'train_cfrontside_6',
'train_cfrontside_7', 'train_cfrontside_9',
'train_cleftside_1', 'train_cleftside_2',
'train_cleftside_3', 'train_cleftside_4',
'train_cleftside_5', 'train_cleftside_6',
'train_cleftside_7', 'train_cleftside_8',
'train_cleftside_9', 'train_coach_1',
'train_coach_2', 'train_coach_3', 'train_coach_4',
'train_coach_5', 'train_coach_6', 'train_coach_7',
'train_coach_8', 'train_coach_9', 'train_crightside_1',
'train_crightside_2', 'train_crightside_3',
'train_crightside_4', 'train_crightside_5',
'train_crightside_6', 'train_crightside_7',
'train_crightside_8', 'train_croofside_1',
'train_croofside_2', 'train_croofside_3',
'train_croofside_4', 'train_croofside_5',
'train_hbackside', 'train_head', 'train_headlight_1',
'train_headlight_2', 'train_headlight_3',
'train_headlight_4', 'train_headlight_5',
'train_hfrontside', 'train_hleftside', 'train_hrightside',
'train_hroofside', 'train_whole',
'tvmonitor_screen', 'tvmonitor_whole'
)
PASCAL_PART_LABEL_MAP = {1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8, 9:9, 10:10,
11:11, 12:12, 13:13, 14:14, 15:15, 16:16, 17:17, 18:18, 19:19, 20:20, 21:21,
22:22, 23:23, 24:24, 25:25, 26:26, 27:27, 28:28, 29:29, 30:30, 31:31, 32:32,
33:33, 34:34, 35:35, 36:36, 37:37, 38:38, 39:39, 40:40, 41:41, 42:42, 43:43,
44:44, 45:45, 46:46, 47:47, 48:48, 49:49, 50:50, 51:51, 52:52, 53:53, 54:54,
55:55, 56:56, 57:57, 58:58, 59:59, 60:60, 61:61, 62:62, 63:63, 64:64, 65:65,
66:66, 67:67, 68:68, 69:69, 70:70, 71:71, 72:72, 73:73, 74:74, 75:75, 76:76,
77:77, 78:78, 79:79, 80:80, 81:81, 82:82, 83:83, 84:84, 85:85, 86:86, 87:87,
88:88, 89:89, 90:90, 91:91, 92:92, 93:93, 94:94, 95:95, 96:96, 97:97, 98:98,
99:99, 100:100, 101:101, 102:102, 103:103, 104:104, 105:105, 106:106, 107:107,
108:108, 109:109, 110:110, 111:111, 112:112, 113:113, 114:114, 115:115, 116:116,
117:117, 118:118, 119:119, 120:120, 121:121, 122:122, 123:123, 124:124, 125:125,
126:126, 127:127, 128:128, 129:129, 130:130, 131:131, 132:132, 133:133, 134:134,
135:135, 136:136, 137:137, 138:138, 139:139, 140:140, 141:141, 142:142, 143:143,
144:144, 145:145, 146:146, 147:147, 148:148, 149:149, 150:150, 151:151, 152:152,
153:153, 154:154, 155:155, 156:156, 157:157, 158:158, 159:159, 160:160, 161:161,
162:162, 163:163, 164:164, 165:165, 166:166, 167:167, 168:168, 169:169, 170:170,
171:171, 172:172, 173:173, 174:174, 175:175, 176:176, 177:177, 178:178, 179:179,
180:180, 181:181, 182:182, 183:183, 184:184, 185:185, 186:186, 187:187, 188:188,
189:189, 190:190, 191:191, 192:192, 193:193, 194:194, 195:195, 196:196, 197:197,
198:198, 199:199, 200:200, 201:201, 202:202, 203:203, 204:204, 205:205, 206:206,
207:207, 208:208, 209:209, 210:210, 211:211, 212:212, 213:213, 214:214, 215:215,
216:216, 217:217, 218:218, 219:219, 220:220, 221:221, 222:222, 223:223, 224:224,
225:225, 226:226, 227:227, 228:228, 229:229, 230:230, 231:231, 232:232, 233:233,
234:234, 235:235, 236:236, 237:237, 238:238, 239:239, 240:240, 241:241, 242:242,
243:243, 244:244, 245:245, 246:246, 247:247, 248:248, 249:249, 250:250, 251:251,
252:252, 253:253, 254:254, 255:255, 256:256, 257:257, 258:258, 259:259, 260:260,
261:261, 262:262, 263:263, 264:264, 265:265, 266:266, 267:267, 268:268, 269:269,
270:270, 271:271, 272:272, 273:273, 274:274, 275:275, 276:276, 277:277, 278:278,
279:279, 280:280, 281:281, 282:282, 283:283, 284:284, 285:285, 286:286, 287:287,
288:288, 289:289, 290:290, 291:291, 292:292, 293:293, 294:294, 295:295, 296:296,
297:297, 298:298, 299:299, 300:300, 301:301, 302:302, 303:303, 304:304, 305:305,
306:306, 307:307, 308:308, 309:309, 310:310, 311:311, 312:312, 313:313, 314:314,
315:315, 316:316}
pascalpart2012_dataset = dataset_base.copy({
'name': 'PASCAL PART 2012',
'train_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
'train_info':'/media/smile/45C142AD782A7053/Datasets/PASCAL_PART/annotation_json/instances_pascal_part_train.json',
'valid_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
'valid_info':'/media/smile/45C142AD782A7053/Datasets/PASCAL_PART/annotation_json/instances_pascal_part_val.json',
'label_map': PASCAL_PART_LABEL_MAP,
'class_names': PASCAL_PART_CLASSES,
})
I did not recreate a model config specifically for this one.
I evaluated with Pascal-Voc that just setting in yolact_base_config
the fields dataset
and num_classes
refering to the dataset was enough.
Note the json files have been generate thank to: https://github.com/waspinator/pycococreator I evaluate the result using pycocotools and it works, I was able to access to the data, categories and annotations without issues.
I reached also during the week end that some issues should come from my settings. I am investigating it looking for whats wrong.
Just by curiosity what would happen if this error would be ignore ?
Ignoring the error will just lead to another one down the line since something (and by looking at your settings I thin truths
) is empty when it shouldn't be.
Is there any annotation in the dataset with segmentation but no bounding box or with a bounding box but no segmentation?
Or are there any images with "iscrowd"
set to 1 for all annotations for that image?
I setted "iscrowd"
to one for every annotation.
My reason is because I have classes and subclasses in the initial annotations so the classes and subclasses are connex some are very close and may be slightly overlapped.
In addition all overlap the category (i.e. dog_leye
overlap dog_whole
).
For this reason I set the flag "iscrowd"
to true for every annotation.
The original annotations where made from binary masks store in a set of mat
files.
So every annotation is made from a binary mask.
But pycococreator
in the process of creation of the json
files as store the bounding box as well as the segmentations.
I processed every file the same way and I try to paid attention to the two errors you mentioned but I may made a mistake.
I actually regenerate the json
files yesterday trying to recheck if an empty mask could have not been ignored.
I am starting an investigation of the annotations and bounding box right now.
If an empty mask has been process maybe I can find it first and then figure out whats happen.
I am also preparing to do the same thing using a micro dataset (5 categories, 20 images: 6 trains, 4 val, 10 tests) from COCO.
Generate a json from the one of COCO make it work, generete 3 jsons
files from the masks for each category check if it works.
Well there's your problem. "iscrowd" tells COCO to ignore every detection that overlaps with it. It's not used for training, it's used to give the model a break when there's a crowd of people and each person isn't individually annotated.
Do not use iscrowd for data you actually want to train on. Like every other method and like COCO does in evaluation, I completely ignore all detections that overlap with an "iscrowd" GT.
Ok I did not understood that like this. I start the regeneration right now.
I have a great news it works !!! The issue came from the parameter iscrowd
.
Thanks very much for all the time you spent helping me really thank you very much.
@JSharp4273 Hi Julien, I have a similar problem. All of my images have many overlapping instances.. so iscrowd=1 in the coco.json file. How can I train YOLACT with this data? I am able to train MaskRCNN with it but I would rather use YOLACT.
Regards, Sam @dbolya
EDIT: The solution was just find and replace all iscrowd: 1 to iscrowd: 0. I'm using the RLE format (not polygon). I also used waspinators function to generate the coco.json file.
Hello
I am trying to retrain yolact on Pascal Part a variation of Pascal VOC where each classes has many sub-classes. To simplify everything I make every sub-classes a class in addition with the 20 original one which give me a set 316 classes. I generated three JSON files for each case.
When I start training I encouter the following error:
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity
Which happen here:losses = criterion(out, wrapper, wrapper.make_mask())
train.py around line 262 (I had some print in my file so my line number is different)Here: https://github.com/eriklindernoren/PyTorch-YOLOv3/issues/110
I read it might be a path issue however I rechecked the image path are correct. Also I am able to train Pascal Voc using the same image path without issues.
I try to investigate the
forward
method of the loss function looking for an empty tensor but I did not find any.