STVIR / PMTD

Pyramid Mask Text Detector designed by SenseTime Video Intelligence Research team.
215 stars 220 forks source link

recommended configuration for a smaller batch size setting #6

Closed QingqingWang-1 closed 5 years ago

QingqingWang-1 commented 5 years ago

Dear author, Do you have the recommended configuration for a smaller batch size setting? I got NAN under the setting batch_size=36, LR=0.04, even when I use 1*binary_cross_entropy loss. When I reduce the LR to 0.004 or 0.001, the model seems not convergent well. I even tried Amsgrad optimizer with different LR.

By the way, I calculate the cropped text area via cv2.findContours(). Is it OK?

JingChaoLiu commented 5 years ago

I got NAN under the setting batch_size=36, LR=0.04,

In our earliest settings for baseline, we use just 8 cards with batch_size=16 and base_learning_rate=16 * 0.00125= 0.02 for warming up 2 epoch and training 40 epoch(a shrinking scheduler mentioned in #2). Though the F-measure is just 60%+, it seems to converge smoothly. Maybe you need to check the labels?

even when I use 1*binary_cross_entropy loss

The loss weight of mask branch keeps unchanged until the loss type is changed to l1_loss for pyramid label. The loss weight of binary_cross_entropy probably should be kept as 1 (haven`t done the loss weight experiments for binary_cross_entropy).

I calculate the cropped text area via cv2.findContours(). Is it OK?

Do you means calculating the text box from the corresponding predicted mask during the inference stage ?Yes, for the baseline, the text box is calculated by the cv2.findContours, and the contour with the max area is selected to be wrapped by the cv2.minAreaRect to output the final text box.

QingqingWang-1 commented 5 years ago

Hi Jingchao,

Many thanks for your reply. I have figured out that the NAN problem is caused by images without text areas. When I solve it, the loss decreases smoothly. But the result is not as good as your baseline's. Since my batch size is small, do you think I should change syncBN to group BN? Will the final performance be affected by the setting of batch size? By the way, what is your setting for FPN_POST_NMS_TOP_N_TRAIN?

The cv2.findContours is used in both cropping operation and reference stage. In your issue, you said you find the 3-8 crossed points of cropped areas. But I use cv2.fillPoly() and cv2.findContours() to find the cropped text areas. I visualize the cropped results, they seem OK.

Best regards, Qingqing Wang


From: JingChaoLiu notifications@github.com Sent: Sunday, 30 June 2019 3:03 AM To: STVIR/PMTD Cc: Qingqing Wang; Author Subject: Re: [STVIR/PMTD] recommended configuration for a smaller batch size setting (#6)

I got NAN under the setting batch_size=36, LR=0.04,

In our earliest settings for baseline, we use just 8 cards with batch_size=16 and base_learning_rate=16 * 0.00125= 0.02 for warming up 2 epoch and training 40 epoch(a shrinking scheduler mentioned in #2https://github.com/STVIR/PMTD/issues/2). Though the F-measure is just 60%+, it seems to converge smoothly. Maybe you need to check the labels?

even when I use 1*binary_cross_entropy loss

The loss weight of mask branch keeps unchanged until the loss type is changed to l1_loss for pyramid label. The loss weight of binary_cross_entropy probably should be kept as 1 (haven`t done the loss weight experiments for binary_cross_entropy).

I calculate the cropped text area via cv2.findContours(). Is it OK?

Do you means calculating the text box from the corresponding predicted mask during the inference stage ?Yes, for the baseline, the text box is calculated by the cv2.findContours, and the contour with the max area is selected to be wrapped by the cv2.minAreaRect to output the final text box.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/STVIR/PMTD/issues/6?email_source=notifications&email_token=AI33HA3FL53UZTKXCS75OHLP46IUTA5CNFSM4H3AE642YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY34QMY#issuecomment-506972211, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI33HA4J6QKROSQTSQSN2HTP46IUTANCNFSM4H3AE64Q.

JingChaoLiu commented 5 years ago

do you think I should change syncBN to group BN?

We didn't perform the experiments for group normalization(GN) provided by maskrcnn-benchmark. It is worth to try the GN.

Will the final performance be affected by the setting of batch size?

In our experiments, 8 cards with base_learing_rate=0.01, 16 cards with base_learing_rate=0.02 and 32 cards with base_learing_rate=0.04 shows no significant difference (within 0.1%).

what is your setting for FPN_POST_NMS_TOP_N_TRAIN?

These settings are: MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN = 2000 and MODEL.RPN.FPN_POST_NMS_PER_BATCH = False

I use cv2.fillPoly() and cv2.findContours() to find the cropped text areas

During the training stage, what type do text areas exist as before the data augmentation, binary masks of bbox_h bbox_w or polygonsof point_num {x, y}?

jylins commented 5 years ago

Hi @JingChaoLiu , does cropping the text area as a polygon exist API?

JingChaoLiu commented 5 years ago

Both the libraries of pyclipper and Polygon3 can do this. Reimplement the PolygonInstance.crop(link) may be a proper way to do this.

jylins commented 5 years ago

@JingChaoLiu Thanks!

QingqingWang-1 commented 5 years ago

@JingChaoLiu Many thanks for your implementation details.

jylins commented 5 years ago

Hi @JingChaoLiu , could you share your implementation of SyncBN? I try to use torch.nn.SyncBatchNorm in Pytorch 1.1, but it crashes in our program.

QingqingWang-1 commented 5 years ago

Hi @JingChaoLiu , could you share your implementation of SyncBN? I try to use torch.nn.SyncBatchNorm in Pytorch 1.1, but it crashes in our program.

I don't know what is the authors' implementation, but I implement it by using torch.nn.BatchNorm2d and torch.nn.BatchNorm1d in the model part and if distributed: sync_bn_model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model) model = torch.nn.parallel.DistributedDataParrel(sync_bn_model, device_ids=[local_rank], output_device=local_rank, ) in train_net.py. Remember to change: if isinstance(module, torch.nn.modules.batchnorm._BatchNorm): [torch/nn/modules/batchnorm.py (line 495)] to if isinstance(module, torch.nn.modules.batchnorm.BatchNorm2d): Otherwise, the model will crash.

hityzy1122 commented 5 years ago

Hi @JingChaoLiu , could you share your implementation of SyncBN? I try to use torch.nn.SyncBatchNorm in Pytorch 1.1, but it crashes in our program.

I don't know what is the authors' implementation, but I implement it by using torch.nn.BatchNorm2d and torch.nn.BatchNorm1d in the model part and if distributed: sync_bn_model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model) model = torch.nn.parallel.DistributedDataParrel(sync_bn_model, device_ids=[local_rank], output_device=local_rank, ) in train_net.py. Remember to change: if isinstance(module, torch.nn.modules.batchnorm._BatchNorm): [torch/nn/modules/batchnorm.py (line 495)] to if isinstance(module, torch.nn.modules.batchnorm.BatchNorm2d): Otherwise, the model will crash.

Thanks, it's really helpful