The single scale f-measure is 55%

zjuDebug commented 7 years ago

@MhLiao hello I want to re-implement the result in your paper ,but I got a more low f-measure which is 55%(single scale), below is the solver which is created with using your default code \ +++++++++++++++++++++++++++++++++++++++++ train_net: "models/TextBoxes/train.prototxt" test_net: "models/TextBoxes/test.prototxt" test_iter: 233 test_interval: 500 base_lr: 0.0001 display: 10 max_iter: 120000 lr_policy: "step" gamma: 0.1 momentum: 0.9 weight_decay: 0.0005 stepsize: 60000 snapshot: 500 snapshot_prefix: "models/TextBoxes/snapshots2/" solver_mode: GPU device_id: 0 debug_info: false snapshot_after_train: true test_initialization: false average_loss: 10 iter_size: 1 type: "SGD" eval_type: "detection" ap_version: "11point" +++++++++++++++++++++++++++++++++++++++++ but the stepsize=40000 in your paper. and below is my train.prototxt's transformer parameters +++++++++++++++++++++++++++++++++++++++++ transform_param { mirror: false mean_value: 104 mean_value: 117 mean_value: 123 resize_param { prob: 1 resize_mode: WARP height: 300 width: 300 interp_mode: LINEAR interp_mode: AREA interp_mode: NEAREST interp_mode: CUBIC interp_mode: LANCZOS4 } emit_constraint { emit_type: CENTER } }

Can your give a detail of your solver.prototxt and transformer paramters, or some advice Thanks a lot~

MhLiao commented 7 years ago

@zjuDebug Can you describe your training details, such as training data, batch size, iterions and so on.

zjuDebug commented 7 years ago

@MhLiao Hello, I am sorry for replying you late. Here is the detail: train batchsize : 32 train data: vgg 80k SynthText(but I split it train and test, I use the train lmdb for train , and use icdar2013 test lmdb for test) iterations = 51500(because this have the biggest detection_eval which is 0.546 in the test phase) learning rate ： 0-40000 iterations: 0.001 ; 40000-51500 iterations: 0.0001; 51500-51500+2k(fine tune for icdar2013 train data): 0.0001 and I got the result:

fine tune 500 iterations: single scale multi scale score>0.9 0.7212 0.7389 score>0.6 0.7556 0.6906
fine tune 2k iterations: single scale multi scale score>0.9 0.7381 0.7195 score>0.6 0.7637 0.6690
the result in the paper single scale multi scale score>0.9 0.7747 0.851 score>0.6 0.8064 0.8286

Can you give me some advances? Thanks in advance

MhLiao commented 7 years ago

@zjuDebug Did you test it with 700*700 size for the single input size? It seems you got even worse results when using multi scales. I suggest you try to change the input size when finetuning the model, which may be helpful.

zjuDebug commented 7 years ago

@MhLiao I am sorry for replying you late, At first, thanks a lot. According your advice, I do some experiment. 1. I fine-tune the icdar2013 train data using 700700 scale(batchsize = 8, and got iterations = 3k 's caffemodel(which is best)) , and got the result: for single scale(700700 test) the best result is : score>0.2 0.8035 for multi scale the best result is: score>0.8 0.8329 (nms = 0.24) but in your paper the best result is 0.851. Can you give me some another advices? Thanks in advance.

Did you mean I should fine-tune the icdar2013 with mutil scales to do data augmentation?

Thanks in advance.

MhLiao commented 7 years ago

@zjuDebug You can try multi-scale data augmentation, such as (300,300) and (700,700), which may be benefit to the robustness of the model.

zjuDebug commented 7 years ago

@MhLiao I am sorry for replying you late, thanks a lot. I followed your advice, and rescale the icdar2013 train data with scales=((300,300),(700,700),(700,500),(700,300),(1600,1600)) to do data augmentation to fine-tune, because the image scale is changing when training, I set the train batchsize = 1, and got the best detection_eval = 0.7398 on icdar 2013 test lmdb (fine-tune iterations = 37500), which is smaller than the single scale 700*700 which is 0.77(iterations = 3k). At last I test the model with multi-scale test, and get f-measure = 0.8125 (conf=0.4), which is smaller than single-scale fine-tune (which is 0.8329). Is the fine-tune training method not very reasonable?

MhLiao commented 7 years ago

@zjuDebug (1600,1600) seems not suitable for training. Actually, I have tried to use (300,300) and (700, 700) to finetune the model, which improved the performance on the single scale (700,700).

zjuDebug commented 7 years ago

@MhLiao I am sorry for replying you late, thanks very very very much. I followed your advice: 1.fine-tune using 300300 2.fine-tune using 700700 based the caffemodel gotten in 1 3.test with scales=((300,300),(700,700),(700,500),(700,300),(1600,1600)), and got 0.8326 4.test with scale = ((700, 700)) and got 0.8114 (which is more better than 0.8035 gotten only using 700*700fine-tune) at last I think the (300, 300) scale can't make the multi-scale result more better, so I do another test scales=((700,700),(700,500),(700,300),(1600,1600)), and got 0.8375. I think this is my best result until now. I only used 80% vgg 80k SynthText to train TextBoxes.

by the way, I replaced the 1*5 kernel size in fc7_mbox_loc layer and fc7_mbox_conf layer with a blstm(the same as CTPN) to extract the feature in every pixes in the feature map, but the result is not good as the model in your paper, can you give me some advices? Thans in advance!!!

MhLiao commented 7 years ago

@zjuDebug That's interesting job! If you use the feature map of fc7 only, the receptive field may be limited. You can analysis the missed texts.(eg. are they too small or too large?)

guddulrk commented 7 years ago

Hi MhLiao,

I am trying to replicate your results and want to run this application on Windows 7 64 bit, with Python 3.5. When I am running test_icdar13.py, I am getting caffe errors. Could you please help me how can I run this application on windows? Thanks.

HerShawn commented 7 years ago

@zjuDebug Could you provide your training details about finetuning using *_700700_* scale? I don't know why the results with 700700 scale is much morse than those with 300*300 scale in my experiment. Could you give some advance to fix this problem?

zjuDebug commented 7 years ago

@HerShawn train batchsize = 8, base_lr = 10^(-4), iteration: 2k+

HelloTobe commented 7 years ago

@zjuDebug Hi, could you please tell me how to calculate the precision, recall and f-measure ? I know the author mentioned them in the paper but i don't know how to calculate them by the codes. Thanks in advance.

HelloTobe commented 7 years ago

@zjuDebug Hi, Since you split the SynthText into train and test, why do you use icdar 2013 rather than SynthText test lmdb

for test? ( What is your SynthText test lmdb used for ?)

How do you split the SynthText? ( by stochastic or certain rules)

Thanks.

zjuDebug commented 7 years ago

@HelloTobe there are some matlab code to compute precision, recall, and f-measure in CAFFE_ROOT/examples/TextBoxes

zjuDebug commented 7 years ago

@HelloTobe Because I want to Re-implement the result in the papaer, so My method is not be in conformity with the principle of machine learning;

by stochastic, and I think this is not the key point ,key point is the fine-tune with icdar2013 with proper scale

HelloTobe commented 7 years ago

@zjuDebug Thanks very much!

MhLiao / TextBoxes

The single scale f-measure is 55% #9