Open tsing-cv opened 5 years ago
thank you for your response, I have few ops difference with your proposal.
Hi @JingChaoLiu , I have another question. Do you use ignore data(###) in ICDAR 2017 on training?
x1,y1,x2,y2,x3,y3,x4,y4,###
Yes, we use them. The boxes with ignore=True
in ICDAR 2017 are similar to those with is_crowd=True
in COCO. So we follow the settings in the Mask R-CNN:
Noting intersection_area = predict_box ∩ groundtruth_ignore_box
, if intersection_area / groundtruth_ignore_box > 0.5
then the predict_box
is set to ignore
,namly neither positive
nor negative
.
Hi @tsing-cv , could you possibly share your implementation. I would appreciate it
@GarrettLee My work fell short of what the paper claimed
3. When loaded the provided pretrained model, the mask_loss between `predict_tensor` and `target_tensor` should be around 0.08
Hi @JingChaoLiu , do you mean that l1_loss(predict_tensor, target_tensor)
is around 0.08 or 5 *l1_loss(predict_tensor, target_tensor)
is around 0.08. Thanks in advance.
5 * l1_loss(predict_tensor, target_tensor)
Mask Loss:
mask_loss = 5 * l1_loss(predict_tensor, target_tensor) predict_tensor = Tensor[B=positive_bbox_num, C={bg, fg}, H=28, W=28]
note:
bg=background
,fg=foreground
target_tensor
's shape is same with thepredict_tensor
by setting bg channel tofull zero
and fg channel topyramid label
Hi @JingChaoLiu , I'm trying to train with L1 loss, but inputs quickly became zeros and mask head stops to learn. I implemented your suggestion to zero input for backgroud but it didn't help:
input = mask_logits[positive_inds, labels_pos]
target = mask_targets
input = torch.where(target>0, input, torch.zeros_like(input))
l1_loss = torch.nn.L1Loss()
mask_loss = 5*l1_loss(input, target)
As for the targets everything is ok (inspected visually). I'm basicly trainging plain maskrcnn-benchmark + pyramid labels + L1loss. Pure maskrcnn-benchmark works fine and I'm just trying to improve it.
Is there anything I should tweak to make this loss converge? Thanks in advance for any suggestions
Regards,
I found I had also PMTD mask predictor "MaskRCNNC4Predictor_Upsample". When I switched back to "MaskRCNNC4Predictor" mask loss seems alright now :) Any idea why bilinear upsampling causes trouble?
I'm basicly trainging plain maskrcnn-benchmark + pyramid labels + L1loss.
In plain maskrcnn-benchmark, the mask loss is calculated by 'binary_cross_entropy', which is implemented by 'mask.sigmoid() + dot product'. When trained with pyramid label, append mask = mask.sigmoid()
explicitly after the mask prediction. Then calculate the L1 Loss between the sigmoid_mask and pyramid label.
Maybe we append a sigmoid() function inside the MaskRCNNC4Predictor_Upsample
.
@JingChaoLiu, thank you for pointing this out to me. Indeed there's sigmoid()
in your MaskRCNNC4Predictor_Upsample
head and at this moment this is the cause of bad convergence (I mean if I add it to MaskRCNNC4Predictor
head I get the same effect). On the other hand, if I train with no sigmoid (which in theory is nothing wrong?) I get really nice mask pyramids.. but.. the spread between predictions is very low, I mean all valueas are really close to 0.5 and the pyramid "generator" doesn't work even if change some "constants" inside (probably some kind of rescaling would help, but I don't want to introduce more "tweaking" and I suspect the real problem is in training).
Well, I guess I will struggle with it for some more time
My plan for now is to:
sigmoid()
versionsigmoid()
versionAny other suggestions? :)
My plan for now is to:
- inspect gradients in
sigmoid()
version- train longer non-
sigmoid()
version- freeze batch norm (?)
- try diffrent loss like mse
Any other suggestions? :)
Same issues. Any solutions?
请问有可用的训练代码参考吗?
Could you provide your implementation details of PMTD and the evaluation score of ICDAR 2017?I'm afraid that I can't help you without these details. You may need to pay attentions to these common details:
Train Stage
Train Scheduler
We train PMTD use 32 (noted as gpu_num) TITIANX 12G GPUs with SyncBatchNorm.
The learning rate changes as follows:
Loss Design
note:
cls=classification
,reg=regression
RPN Loss and Bounding Box Loss
RPN Loss and Bounding Box Loss are same with Mask RCNN, only changing the
class_num
from 81 to 2Mask Loss:
note:
bg=background
,fg=foreground
target_tensor
's shape is same with thepredict_tensor
by setting bg channel tofull zero
and fg channel topyramid label
Data augmentation, RPN Anchor and OHEM
these details should implement as the paper described. Remember to random resize the image without keeping the aspect ratio
Pyramid Label generation:
predict_tensor
andtarget_tensor
should be around 0.08Test Stage
just follow the release code