STVIR / PMTD

Pyramid Mask Text Detector designed by SenseTime Video Intelligence Research team.
215 stars 220 forks source link

when I use four rtx2080ti to train the maskrcnn as a baseline,the F-measure is only about 65%,is it normal? #7

Open kapness opened 5 years ago

kapness commented 5 years ago

How can I get a good performance on four 11G GPUS ?

JingChaoLiu commented 5 years ago

In our training, the original Mask R-CNN indeed only achieve a F-measure of 66%. The 10% improvement in our baseline may come from: (no ablation study, no guarantee, just based on memories)

  1. Data Augmentation +6%

  2. OHEM +2%

  3. Train->Test extends to Train+Validation-> Test +1%

  4. Use the Ignore Annotation +1%

Note: the first three tricks have been elaborated in our paper. Recently,I noticed the implementation of Use the Ignore Annotation was not a part of the official implementation but from an open source repository matterport/Mask_RCNN which our private framework followed.

The main idea of Use the Ignore Annotation is when a predicted box overlaps with the groundtruth box at a high ratio, then this predicted box is labeled as ignore, in other words, neither positive nor negative. The details can be referred in build_rpn_targets of RPN and detection_targets_graph of Bbox branch. And the only difference taken from cocoapi is that the evaluation criteria, intersection / (gt_ignore_area + pred_area - intersection) < 0.001, is replaced to intersection / pred_area < 0.5 .

kapness commented 5 years ago

Thanks very much for your reply.now I have a new question,in ohem process,the paper says you select 512 difficult samples to update the network,does it mean you only provide 512 samples to ROI heads,or you only compute 512 samples as RPN loss?

---Original--- From: "JingChaoLiu"notifications@github.com Date: Mon, Jul 22, 2019 00:12 AM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Author"author@noreply.github.com; Subject: Re: [STVIR/PMTD] when I use four rtx2080ti to train the maskrcnn as a baseline,the F-measure is only about 65%,is it normal? (#7)

In our training, the original Mask R-CNN indeed only achieve a F-measure of 66%. The 10% improvement in our baseline may come from: (no ablation study, no guarantee, just based on memories)

Data Augmentation +6%

OHEM +2%

Train->Test extends to Train+Validation-> Test +1%

Use the Ignore Annotation +1%

Note: the first three tricks have been elaborated in our paper. Recently,I noticed the implementation of Use the Ignore Annotation was not a part of the official implementation but from an open source repository matterport/Mask_RCNN which our private framework followed.

The main idea of Use the Ignore Annotation is when a predicted box overlaps with the groundtruth box at a high ratio, then this predicted box is labeled as ignore, in other words, neither positive nor negative. The details can be referred in build_rpn_targets of RPN and detection_targets_graph of Bbox branch. And the only difference taken from cocoapi is that the evaluation criteria, intersection / (gt_ignore_area + pred_area - intersection) < 0.001, is replaced to intersection / pred_area < 0.5 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

JingChaoLiu commented 5 years ago

only compute 512 samples as box_cls and box_reg loss, not in RPN

zuokai commented 5 years ago

@JingChaoLiu hi, how many Data Augmentation methods do you use?

kapness commented 5 years ago

hi ,now I have a small problem,in random crop process , do you make sure that every cropped region has at least one clear GT box ?because I find that maskrcnn can't compute loss on a picture with no GT box.. thanks for your kindness again!

---Original--- From: "JingChaoLiu"notifications@github.com Date: Mon, Jul 22, 2019 00:12 AM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Author"author@noreply.github.com; Subject: Re: [STVIR/PMTD] when I use four rtx2080ti to train the maskrcnn as a baseline,the F-measure is only about 65%,is it normal? (#7)

In our training, the original Mask R-CNN indeed only achieve a F-measure of 66%. The 10% improvement in our baseline may come from: (no ablation study, no guarantee, just based on memories)

Data Augmentation +6%

OHEM +2%

Train->Test extends to Train+Validation-> Test +1%

Use the Ignore Annotation +1%

Note: the first three tricks have been elaborated in our paper. Recently,I noticed the implementation of Use the Ignore Annotation was not a part of the official implementation but from an open source repository matterport/Mask_RCNN which our private framework followed.

The main idea of Use the Ignore Annotation is when a predicted box overlaps with the groundtruth box at a high ratio, then this predicted box is labeled as ignore, in other words, neither positive nor negative. The details can be referred in build_rpn_targets of RPN and detection_targets_graph of Bbox branch. And the only difference taken from cocoapi is that the evaluation criteria, intersection / (gt_ignore_area + pred_area - intersection) < 0.001, is replaced to intersection / pred_area < 0.5 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

kapness commented 5 years ago

or just do random crop and set mask loss and box reg loss as 0? because on icdar15 dataset,if I only do random crop, there are too many croppped area with no GT box,and the loss becomes bad.

---Original--- From: "JingChaoLiu"notifications@github.com Date: Mon, Jul 22, 2019 21:29 PM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Author"author@noreply.github.com; Subject: Re: [STVIR/PMTD] when I use four rtx2080ti to train the maskrcnn as a baseline,the F-measure is only about 65%,is it normal? (#7)

only compute 512 samples as RPN loss

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

JingChaoLiu commented 5 years ago

when no GT after cropping (though it rarely happens), just skip any steps involving positive ROIs (bbox regression and mask generation), set the corresponding losses to 0 (just for logging) and not backward them. I guess here is a good position for ignoring these zero losses.

JingChaoLiu commented 5 years ago

By the way, all the images in ICDAR 2015 shares a same shape of 1280x720, so as mentioned in the paper, it is recommened to crop image by preserving the aspect ratio.