ChristianMarzahl / ObjectDetection

Some experiments with object detection in PyTorch
136 stars 43 forks source link

Index Error when using Transfer Learning #24

Open EMUNES opened 4 years ago

EMUNES commented 4 years ago

This is a great project which makes object detection in fastai much easier. Everything works pretty well until I use transfer learning for object detection. My code to build the learner is pretty much the same with the cocotiny_retina_net example in the project code: image I set size, bs = 512, 16 for the learner to train 8 rounds, and after that I use learn.data=data in which the new data has size, bs = 1024, 4 in it. Those are all the difference for transfer learning but when I train the model with image size of 1024 I always get:

IndexError                                Traceback (most recent call last)
<ipython-input-30-d81c6bd29d71> in <module>
----> 1 learn.lr_find()

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\fastai\train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, wd)
     39     cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
     40     epochs = int(np.ceil(num_it/len(learn.data.train_dl))) * (num_distrib() or 1)
---> 41     learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
     42 
     43 def to_fp16(learn:Learner, loss_scale:float=None, max_noskip:int=1000, dynamic:bool=True, clip:float=None,

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\fastai\basic_train.py in fit(self, epochs, lr, wd, callbacks)
    198         else: self.opt.lr,self.opt.wd = lr,wd
    199         callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
--> 200         fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
    201 
    202     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\fastai\basic_train.py in fit(epochs, learn, callbacks, metrics)
     99             for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
    100                 xb, yb = cb_handler.on_batch_begin(xb, yb)
--> 101                 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
    102                 if cb_handler.on_batch_end(loss): break
    103 

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\fastai\basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     28 
     29     if not loss_func: return to_detach(out), to_detach(yb[0])
---> 30     loss = loss_func(out, *yb)
     31 
     32     if opt is not None:

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\object_detection_fastai\loss\RetinaNetFocalLoss.py in forward(self, output, bbox_tgts, clas_tgts)
     53         focal_loss = torch.tensor(0, dtype=torch.float32).to(clas_preds.device)
     54         for cp, bp, ct, bt in zip(clas_preds, bbox_preds, clas_tgts, bbox_tgts):
---> 55             bb, focal = self._one_loss(cp, bp, ct, bt)
     56 
     57             bb_loss += bb

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\object_detection_fastai\loss\RetinaNetFocalLoss.py in _one_loss(self, clas_pred, bbox_pred, clas_tgt, bbox_tgt)
     32         bbox_mask = matches >= 0
     33         if bbox_mask.sum() != 0:
---> 34             bbox_pred = bbox_pred[bbox_mask]
     35             bbox_tgt = bbox_tgt[matches[bbox_mask]]
     36             bb_loss = self.reg_loss(bbox_pred, bbox_to_activ(bbox_tgt, self.anchors[bbox_mask]))

IndexError: The shape of the mask [24480] at index 0 does not match the shape of the indexed tensor [24192, 4] at index 0

I work on win10-cuda1.02-pytorch1.5.0-torchvision0.6 and everything works right besides this. Isn't it the same size in each batch when I transfer size, bs=512, 16 to size, bs=1024, 4? How could this Index error occur?

EMUNES commented 4 years ago

Another similiar problem occurs when I directly use images in size 512 for inference. codes from baseline that I use:

detect_thresh = 0.3
nms_thresh = 0.5
size = 1024

nr = []
pos = []
for image_id, (test_image, emp_label) in enumerate(learn.data.test_ds):
    #mean, std = learn.data.stats

    pos.append([0, 0, 0, 0])
    nr.append(0)

    image_ori = test_image
    # ori_shape = image_ori.shape[:2]

    # image = cv2.resize(image_ori, (size, size))

    # image = pil2tensor(image / 255., np.float32).cuda()

    #image = transforms.Normalize(mean, std)(image)

    image = test_image.resize(1024).data.cuda()

    # class_pred_batch, bbox_pred_batch, _ = learn.model(image[None, :, :, :])

    class_pred_batch, bbox_pred_batch, _ = learn.model(image[None, :])

    for clas_pred, bbox_pred in zip(class_pred_batch, bbox_pred_batch):
        bbox_pred, scores, preds = process_output(clas_pred, bbox_pred, anchors, detect_thresh)

        if bbox_pred is not None:
            to_keep = nms(bbox_pred, scores, nms_thresh)
            bbox_pred, preds, scores = bbox_pred[to_keep].cpu(), preds[to_keep].cpu(), scores[to_keep].cpu()

            t_sz = torch.Tensor(ori_shape)[None].float()
            bbox_pred = rescale_box(bbox_pred, t_sz)

            temp_pos = [0, 0, 0, 0]
            for id, box in enumerate(bbox_pred[:2]):
                x = to_np(box[0] + box[2] / 2)
                y = to_np(box[1] + box[3] / 2)
                if id == 0:
                    temp_pos[ :2] = x,y
                    nr[image_id] = 1
                else:
                    temp_pos[2: ] = x, y
                    nr[image_id] = 2

output:

RuntimeError                              Traceback (most recent call last)
<ipython-input-65-883a98e5b6df> in <module>
     27 
     28     for clas_pred, bbox_pred in zip(class_pred_batch, bbox_pred_batch):
---> 29         bbox_pred, scores, preds = process_output(clas_pred, bbox_pred, anchors, detect_thresh)
     30 
     31         if bbox_pred is not None:

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\object_detection_fastai\helper\object_detection_helper.py in process_output(clas_pred, bbox_pred, anchors, detect_thresh)
    227 
    228 def process_output(clas_pred, bbox_pred, anchors, detect_thresh=0.25):
--> 229     bbox_pred = activ_to_bbox(bbox_pred, anchors.to(clas_pred.device))
    230     clas_pred = torch.sigmoid(clas_pred)
    231 

D:\ProgramFile\anaconda\envs\ai\lib\site-packages\object_detection_fastai\helper\object_detection_helper.py in activ_to_bbox(acts, anchors, flatten)
     74     if flatten:
     75         acts.mul_(acts.new_tensor([[0.1, 0.1, 0.2, 0.2]]))
---> 76         centers = anchors[...,2:] * acts[...,:2] + anchors[...,:2]
     77         sizes = anchors[...,2:] * torch.exp(acts[...,2:])
     78         return torch.cat([centers, sizes], -1)

RuntimeError: The size of tensor a (24480) must match the size of tensor b (24192) at non-singleton dimension 0

Two numbers don't match, same as above - 24480 (the number of anchors) and 24192 (what is this about). I guess this has something to do with the resizing as both those problems occur after resizing but I really can't come up with anything to fix this. Also I'm using GPU for training.