STVIR / PMTD

Pyramid Mask Text Detector designed by SenseTime Video Intelligence Research team.
215 stars 220 forks source link

About configurations #8

Open hellbell opened 5 years ago

hellbell commented 5 years ago

First, thank you for your kind paper and github page. Your work is super useful for studying text detection using mask-rcnn baseline. I am reproducing the results of PMTD but my results are little bit worse. (Mask RCNN baseline 60% F-measure on MLT dataset) So I'm figuring out what is wrong with my configuration. It will be very helpful if the config file (.yaml) is provided, or let me know RPN.ANCHOR_STRIDE setting (currently, I'm using (4, 8, 16, 32, 64)) Thanks!

kapness commented 5 years ago

I think you may meet the same question as I met before.You can have a look at my issue.The author gives some useful advice.

hellbell commented 5 years ago

@kapness Thank you for the kind reply! I followed your issue but the results were still worse than my expectation. It would be very helpful if you share your config file (.yaml) :) Thank you again.

kapness commented 5 years ago

if you complete data aug correctly in transform.py ,the F-score can reach 72% without other changes.I do not change the original yaml file.

---Original--- From: "Sangdoo Yun"notifications@github.com Date: Tue, Aug 6, 2019 08:45 AM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Mention"mention@noreply.github.com; Subject: Re: [STVIR/PMTD] About configurations (#8)

@kapness Thank you for the kind reply! I followed your issue but the results were still worse than my expectation. It would be very helpful if you share your config file (.yaml) :) Thank you again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kapness commented 5 years ago

my batchsize is 16 and LR starts with 0.01

---Original--- From: "Sangdoo Yun"notifications@github.com Date: Tue, Aug 6, 2019 08:45 AM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Mention"mention@noreply.github.com; Subject: Re: [STVIR/PMTD] About configurations (#8)

@kapness Thank you for the kind reply! I followed your issue but the results were still worse than my expectation. It would be very helpful if you share your config file (.yaml) :) Thank you again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hellbell commented 5 years ago

@kapness Thank you for your advice. I will try it right now!

JingChaoLiu commented 5 years ago

@kapness Thanks a lot!

kapness commented 5 years ago

@hellbell And the _C.MODEL.RPN.ASPECT_RATIOS in defaults.py should be modified as the paper said. I forgot this tip before.

hellbell commented 5 years ago

@kapness @JingChaoLiu Thank you for your kind replies. I trained vanilla Mask-RCNN on ICDAR2017-MLT and got F-score only 62% which is still far under the baseline. My settings:

My questions are:

Many thanks!

kapness commented 5 years ago

if you use original crop function implemented by maskrcnn ,maybe you are wrong.I think it doesn't crop mask gt properly.You can see its source code in modeling/structure.

---Original--- From: "Sangdoo Yun"notifications@github.com Date: Thu, Aug 15, 2019 15:04 PM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Mention"mention@noreply.github.com; Subject: Re: [STVIR/PMTD] About configurations (#8)

@kapness @JingChaoLiu Thank you for your kind replies. I trained vanilla Mask-RCNN on ICDAR2017-MLT and got F-score only 62% which is still far under the baseline. My settings:

based on e2e_mask_rcnn_R_50_FPN_1x.yaml

changed MODEL.RPN.ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)

changed MODEL.RPN.FPN_POST_NMS_PER_BATCH = False

4 gpus with these learning rates SOLVER: BASE_LR: 0.01 WEIGHT_DECAY: 0.0001 STEPS: (50000, 80000) MAX_ITER: 100000 IMS_PER_BATCH: 16
My questions are:

At the test time, the confidence score threshold for selecting valid bounding box is set to 0.5. Is it okay?

I guess my data augmentation of trasnform.py might be wrong. Would you share your transform.py file or give me some tips? I posted my code snippets. class RandomSampleCrop(object): def init(self, crop_size=640, min_size=640, max_size=2560): self.crop_size = crop_size self.min_size = min_size self.max_size = max_size def get_size(self): # w, h = image_size w_resize = random.randint(self.min_size, self.max_size) h_resize = random.randint(self.min_size, self.max_size) return (h_resize, w_resize) def call(self, image, target): while (True): resized_size = self.get_size() image_r = F.resize(image, resized_size) target_r = target.resize(image_r.size) width, height = image_r.size crop_left = random.randint(0,width-self.crop_size) crop_top = random.randint(0,height-self.crop_size) target_r_c = target_r.crop([crop_left, crop_top, crop_left+self.crop_size, crop_top+self.crop_size]) target_r_c = target_r_c.clip_to_image() if len(target_r_c) > 0: too_small = False for t in target_r_c.bbox: w, h = t[2] - t[0], t[3] - t[1] if w < 1 or h < 1: too_small = True if too_small: continue break target_r_c = target_r_c image_r_c = image_r.crop([crop_left, crop_top, crop_left + self.crop_size, crop_top + self.crop_size])
Many thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hellbell commented 5 years ago

@kapness I checked the crop function with some visualization. It seems ok..

JingChaoLiu commented 5 years ago

@kapness thanks again for your reply. @hellbell

  1. Following the previous answers and the paper, here is one configuration which I just wrote. Sorry for no time to validate it and no guarantee to the F-measure.

    MODEL:
    META_ARCHITECTURE: "GeneralizedRCNN"
    WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
    BACKBONE:
    CONV_BODY: "R-50-FPN"
    RESNETS:
    BACKBONE_OUT_CHANNELS: 256
    RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    ANCHOR_SIZES: (16, 32, 64, 128, 256)
    ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)
    STRADDLE_THRESH: 10 # Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels,
      # I changed this value from 0 to 10 in the early stage accidentally and forgot to change back. But I think this change makes no difference.
    PRE_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_PER_BATCH: False
    ROI_HEADS:
    USE_FPN: True
    ROI_BOX_HEAD:
    NUM_CLASSES: 2
    POOLER_RESOLUTION: 7
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
    ROI_MASK_HEAD:
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
    PREDICTOR: "MaskRCNNC4Predictor"
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 2
    RESOLUTION: 28
    SHARE_BOX_FEATURE_EXTRACTOR: False
    MASK_ON: True
    DATASETS:
    TRAIN: ("icadar_2017_mlt_train", "icdar_2017_mlt_val")
    TEST: ("icdar_2017_mlt_test",)
    DATALOADER:
    SIZE_DIVISIBILITY: 32
    SOLVER:
    WARMUP_METHOD: 'linear' # PMTD use 'exponential' which is not implemented in maskrcnn-benchmark
    WARMUP_ITERS: 4500 # warmup_iter = (image_num=9000 * warmup_epoch=8 / batch_size=16)
    IMS_PER_BATCH: 16
    BASE_LR: 0.02 # PMTD use batch_size * 0.00125 with syncBN
    WEIGHT_DECAY: 0.0001
    STEPS: (49500, 76500) # warmup_iter + (iter * 0.5, iter * 0.8)
    MAX_ITER: 94500 # iter = (image_num=9000 * warmup_epoch=160 / batch_size=16) = 90000, max_iter = (warmup_iter + iter)
  2. Have you done a grid search for the parameters (cls_threshold, nms_threshold) of final NMS? See #4 for more details. This can make a bigger difference than some neglectable training details.

  3. See #5 to see the problematic crop operation. There are two problems. One, the point number of the cropped mask may float from 3 to 8, no longer a constant number 4. Two, the difference between the cropped origin bounding box and the correct cropped bounding box.

kapness commented 5 years ago

now I have one question about the OHEM.In paper,you compute 512 proposals for OHEM in roi_heads,is it right?(or should I modify it in RPN branch?) But my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM? In maskrcnn ,roi_head branch gets 512 proposals/image. thanks for your kind reply again.I think this is my last question about the baseline...

---Original--- From: "JingChaoLiu"notifications@github.com Date: Thu, Aug 15, 2019 17:17 PM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Mention"mention@noreply.github.com; Subject: Re: [STVIR/PMTD] About configurations (#8)

@kapness thanks again for your reply. @hellbell

Following the previous answers and the paper, here is one configuration which I just wrote. Sorry for no time to validate it and no guarantee to the F-measure. MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50" BACKBONE: CONV_BODY: "R-50-FPN" RESNETS: BACKBONE_OUT_CHANNELS: 256 RPN: USE_FPN: True ANCHOR_STRIDE: (4, 8, 16, 32, 64) ANCHOR_SIZES: (16, 32, 64, 128, 256) ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46) STRADDLE_THRESH: 10 # Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels, # I changed this value from 0 to 10 in the early stage accidentally and forgot to change back. But I think this change makes no difference. PRE_NMS_TOP_N_TRAIN: 2000 PRE_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_PER_BATCH: False ROI_HEADS: USE_FPN: True ROI_BOX_HEAD: NUM_CLASSES: 2 POOLER_RESOLUTION: 7 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) POOLER_SAMPLING_RATIO: 2 FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor" PREDICTOR: "FPNPredictor" ROI_MASK_HEAD: POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor" PREDICTOR: "MaskRCNNC4Predictor" POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 RESOLUTION: 28 SHARE_BOX_FEATURE_EXTRACTOR: False MASK_ON: True DATASETS: TRAIN: ("icadar_2017_mlt_train", "icdar_2017_mlt_val") TEST: ("icdar_2017_mlt_test",) DATALOADER: SIZE_DIVISIBILITY: 32 SOLVER: WARMUP_METHOD: 'linear' # PMTD use 'exponential' which is not implemented in maskrcnn-benchmark WARMUP_ITERS: 4500 # warmup_iter = (image_num=9000 warmup_epoch=8 / batch_size=16) IMS_PER_BATCH: 16 BASE_LR: 0.02 # PMTD use batch_size 0.00125 with syncBN WEIGHT_DECAY: 0.0001 STEPS: (49500, 76500) # warmup_iter + (iter 0.5, iter 0.8) MAX_ITER: 94500 # iter = (image_num=9000 * warmup_epoch=160 / batch_size=16) = 90000, max_iter = (warmup_iter + iter)

Have you done a grid search for the parameters (cls_threshold, nms_threshold) of final NMS? See #4 for more details. This can make a bigger difference than some neglectable training details.

See #5 to see the problematic crop operation. There are two problems. One, the point number of the cropped mask may float from 3 to 8, no longer a constant number 4. Two, the difference between the cropped origin bounding box and the correct cropped bounding box.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

JingChaoLiu commented 5 years ago

OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.

  1. image -> backbone
  2. -> RPN

    pred_cls, pred_reg = RPN.forward(All proposals) random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 * image_num proposals to calculate loss. (sample_num is far less than len(All proposals)) postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN * image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False

  3. RPN -> bbox branch

    pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN) random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals to calculate loss (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.

  4. RPN -> mask branch

    pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN) calculate mask loss for all predicted mask

  5. backward the loss to update parameters

my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?

batch_size = 16 is enough.

kapness commented 5 years ago

thx very very much for saving me!

---Original--- From: "JingChaoLiu"notifications@github.com Date: Thu, Aug 15, 2019 21:24 PM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Mention"mention@noreply.github.com; Subject: Re: [STVIR/PMTD] About configurations (#8)

OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.

image -> backbone

-> RPN

pred_cls, pred_reg = RPN.forward(All proposals) random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 image_num proposals to calculate loss. (sample_num is far less than len(All proposals)) postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False

RPN -> bbox branch

pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN) random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 image_num proposals to calculate loss (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.

RPN -> mask branch

pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN) calculate mask loss for all predicted mask

backward the loss to update parameters

my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?

batch_size = 16 is enough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kapness commented 5 years ago

I'm so sorry to disturb you again..

(add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.

here,in original code,it only computes the reg_loss of positive proposals.should I first set the reg loss of negative proposals,then add cls_loss to reg_loss and sort?

---Original--- From: "JingChaoLiu"notifications@github.com Date: Thu, Aug 15, 2019 21:24 PM To: "STVIR/PMTD"PMTD@noreply.github.com; Cc: "kapness"1617498149@qq.com;"Mention"mention@noreply.github.com; Subject: Re: [STVIR/PMTD] About configurations (#8)

OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.

image -> backbone

-> RPN

pred_cls, pred_reg = RPN.forward(All proposals) random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 image_num proposals to calculate loss. (sample_num is far less than len(All proposals)) postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False

RPN -> bbox branch

pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN) random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 image_num proposals to calculate loss (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.

RPN -> mask branch

pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN) calculate mask loss for all predicted mask

backward the loss to update parameters

my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?

batch_size = 16 is enough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

JingChaoLiu commented 5 years ago

yes, for the negative proposals, just set the reg loss to 0 before sorting