Yuliang-Liu / Curve-Text-Detector

This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.
639 stars 155 forks source link

Question about regression target and info_syn_transform #21

Closed ZhuLingfeng1993 closed 5 years ago

ZhuLingfeng1993 commented 5 years ago

I find that regression target in function info_syn_transform_hw in lib/fast_rcnn/bbox_transform.py is:

    targets_dp1h = ( gt_p1h - ex_heights) * 0.5 / ex_heights
    ...
    encode_0 = np.zeros_like(targets_dp1w)
    targets = np.vstack((encode_0, encode_0, targets_dp1h, targets_dp2h, targets_dp3h, targets_dp4h, targets_dp5h, targets_dp6h, targets_dp7h, targets_dp8h, targets_dp9h, targets_dp10h, targets_dp11h, targets_dp12h, targets_dp13h, targets_dp14h, encode_0, encode_0, targets_dp1w, targets_dp2w, targets_dp3w, targets_dp4w, targets_dp5w, targets_dp6w, targets_dp7w, targets_dp8w, targets_dp9w, targets_dp10w, targets_dp11w, targets_dp12w, targets_dp13w, targets_dp14w)).transpose() # 44

which is different from the form of parameterized offset in equation (1) of paper. Can you explain this?

On the other hand, I can't understand the function of four extra regressing items: the x,y minimum and maximum of the circumscribed rectangle, and I find they are just encode 0 in regression targets, I hope you can help me to understand this.

Yuliang-Liu commented 5 years ago

Hi,

Extra GT x, y minimum and maximum of circumscirbed rectangle are only used for the circumscribed rectangle regression branch (better in netscope visulization): layer { bottom: "conv_new_1" top: "rfcn_bbox" name: "rfcn_bbox" type: "Convolution" convolution_param { num_output: 392 # 24(7^2) cls_numcors(score_maps_size^2) kernel_size: 1 pad: 0 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } param { lr_mult: 1.0 } param { lr_mult: 2.0 } }

As for encode 0. We find when using LSTM units, the head of the time sequence converges very slow, which can be solved by padding two encode 0 sequences. If you try adding four encode 0, it will show similar results as using two.

BTW, the paper does not reach every aspect of a matter, and if anything that confuses you please feel free to let me know.

Hope this helps.

Regards, yl

ZhuLingfeng1993 commented 5 years ago

Thank you for your reply.