facebookresearch / multipathnet

A Torch implementation of the object detection network from "A MultiPath Network for Object Detection" (https://arxiv.org/abs/1604.02135)
Other
1.34k stars 275 forks source link

A guid to prepare data to train a detection model? #16

Closed samson-wang closed 7 years ago

samson-wang commented 8 years ago

I notice that the training has been kind of "hard coded" to different versions of pascal voc and coco datasets. I'm trying to figure out the data flow and data format requirement to run training on "new" data. Still have some problems in loading and preparing data before training (Even on voc or coco data). Could anyone give some advices to help me to build up the process.

Now, I have some images and corresponding bounding box annotations. If I want to train on this data, I need to generate some proposals, i.e. 1000/image. Put annotations and proposals in "least required" Torch formats.

I think I have to write some pieces of code to implement

I hope to be a contributor. ;-)

szagoruyko commented 8 years ago

@samson-wang the code is generic and can be used for COCO/VOC/ImageNet given JSON annotations in the right format similar to http://mscoco.org/external/ and proposals similar to the ones that we provide, in torch format.

samson-wang commented 8 years ago

@szagoruyko Thank you! I'm working on it. For proposals, I want to use deepmask. Referred to data/proposals/coco/deepmask/val.t7, images, boxes, scores should be contained. In addition, scores seem to be not in an order. In deepmask project, There is only a getTopProps function which generates proposals ordered by corresponding scores. Is the score ordering a case that I should take care? Another thing is that the number of boxes vary in pascal selective search proposals, while keep the same in the deepmask. Any reasons?

Thanks!

samson-wang commented 8 years ago

@szagoruyko I have trained on my own data which only has 1 category of boundingbox. There is a trick to change opt.num_classes = opt.dataset == 'pascal' and 21 or 81 to opt.num_classes = 2. I'm not sure if it make sense. However, when run demo with the trained model something weird happens. After execution of prob, maxes = detections:max(2), get all '1's for both prob and maxes which leads to select on empty tensor in following code local idx = maxes:squeeze():gt(1):cmul(prob:gt(config.thr)):nonzero():select(2,1). Could you give some advices? Thank you!

samson-wang commented 8 years ago

I think the problem may be too few positive samples in the training dataset. So when predicting, all proposals are predicted to negative. Training data summary: 2000 images, 1 category object, 1 ground truth bbox per image, 1000 proposals per image generated by deepmask, 100 epoch

Can I set a higher learning rate for positive samples?

szagoruyko commented 8 years ago

@samson-wang looks like you need to adjust fraction of positive examples in batches to balance your data, check here https://github.com/facebookresearch/multipathnet/blob/master/BatchProviderROI.lua#L19

samson-wang commented 8 years ago

@szagoruyko Thank you for your tip! I found that running demo.lua got all negative after detector:detect(img:float(), bboxes:float()) even on an image from training set. Other wise, running run_test.lua on training set is all right. Evaluation on test set got 0.33 AP @0.75 and 0.77 AP @0.5. Not as bad as demo result. After some debug, though not conclusive, there is something wired. I use the following code to generate proposals with deepmask for training and testing.

    -- load image
    local img = image.load(img_file)
    local h,w = img:size(2),img:size(3)

    -- forward all scales
    infer:forward(img)

    -- get top proposals
    local masks,_ = infer:getTopProps(.2,h,w)
    rs = maskApi.encode(masks)
    bbs = maskApi.toBbox(rs)

    table.insert(images, paths.basename(img_file))
    table.insert(scores, _:index(2, torch.LongTensor{1}))
    table.insert(boxes, bbs)

The generated bounding box looks like

  78   89   69   40
   0   22  624  618
...

When evaluation. After execution of getROIBoxes The corresponding boxes looks like

89  78   40   69
22   0  618 624
...

Positions has been switched. I'm not sure if it is a problem. Still working on this.

samson-wang commented 8 years ago

Update, https://github.com/facebookresearch/multipathnet/blob/e6b9e0dc68db5af4662be5f6c272c1db82ab514d/DataSetJSON.lua#L234 boxes permuted.

samson-wang commented 8 years ago

@szagoruyko Stupid mistake. The image transformer not the same with train and evaluation. So the scores inferred get wrong.

teezeit commented 8 years ago

Hi Samson, did you get it working? I am also trying to set up my own training pipeline, what did your workflow end up like?