Result not reproducible for ICDAR2015

bado-lee commented 6 years ago

Hi, I'm trying to reproduce the result in the paper which are, But it seems not reproducible.

Im using the pre-trained model downloaded from your link which is for ICDAR2015 (model_icdar15.caffemodel)

I've tested in 2 ways, These are my parameters, and the rest is as is except the loops added for batch inference.

And for the evaluation, I'm using the official evaluation code downloaded from ICDAR challenge

And below is the tested set link

ICDAR Challenge 4 task1 aka ICDAR2015 task 1 TESTSET

Config 1

demo.py

'input_height' : 1024,
'input_width' : 1024,
'overlap_threshold' : 0.2,

deploy.prototxt

dim: 1024
dim: 1024

Result 1

Calculated!{"recall": 0.7448242657679345, "precision": 0.815068493150685, "hmean": 0.7783647798742138, "AP": 0}

Config 2

Same as Config 1 but, demo.py

'overlap_threshold' : 0.5,

Result 2

Calculated!{"recall": 0.8078960038517092, "precision": 0.7198627198627199, "hmean": 0.7613430127041741, "AP": 0}

Both of the results does not seem to match the performance mentioned in the paper.

Please help if there are another parameters to be tuned.

MhLiao commented 6 years ago

We are having the Spring Festival and have no machine to test the code now. I will check the performance after the vacation.

bado-lee commented 6 years ago

Thanks a lot for the reply. I'll wait for your later response. Happy Holidays.

MhLiao commented 6 years ago

@bado-lee The previous demo includes detection and recognition, and the "f_score_threshold" is not optimal for reproducing the detection result for IC15. To reproduce the detection results, you can use "demo_det.py", which only includes detection. (You also need to change the input scale into 1024*1024.)

bado-lee commented 6 years ago

@MhLiao Thanks for the reply. I've checked and tested your new code, but it's still short in 2% point.

'input_height' : 1024,
'input_width' : 1024,
'overlap_threshold' : 0.2,
'det_score_threshold' : 0.2,

I've used above configurations and the only parameter that is different from my previous testing is det_score_threashold which was 0.1

below is the result so far

Calculated!{"recall": 0.7558979297063072, "precision": 0.8532608695652174, "hmean": 0.8016339034975747, "AP": 0}

which is still 1.6% less than your Quad(Single) result. Please let me know if there are other factors that I can try. Thanks in advance.

MhLiao commented 6 years ago

@bado-lee I am sorry that I gave a wrong model which achieves lower performance. Now the link to the model file is updated you can re-download it. The current model should achieve an F-measure of 0.816, whose performance is comparable to the model in the paper.

bado-lee commented 6 years ago

@MhLiao Thank you very much for your feedback & help. I've finally achieved reported performance which is

Calculated!{"recall": 0.7891189215214252, "precision": 0.8523140925637025, "hmean": 0.8195, "AP": 0}

(And even 0.2% more then reported one with Quad)

And if I'm not mistaken, published code is for Quad only.

Do you have plans for publishing code&model for Quad_MS as well? Please correct me if the code is for Quad_MS (I mean if this is so, still needs 1% to get Quad_MS score).

SHaiHosh commented 6 years ago

Isn't MS score just running the model with multiple input image scales - just like TextBoxes code?

bado-lee commented 6 years ago

@SHaiHosh I think Multi-Scale mentioned in the paper by @MhLiao is a explicit multi-scale image input. Furthermore, the scores I've reproduced with ICDAR2015 matches with Quad score.

One I've reproduced with the code

Calculated!{"recall": 0.7891189215214252, "precision": 0.8523140925637025, "hmean": 0.8195, "AP": 0}

Table 3

Table 2

MhLiao commented 6 years ago

@bado-lee @SHaiHosh The ms code is similar to TextBoxes. The scales include 384*384, 768*768, 1024*1024 and 1536*1536.

SHaiHosh commented 6 years ago

thank you @MhLiao

DecentMakeover commented 5 years ago

hi @bado-lee where did you find the conf.lua file to run the demo.py file?

bado-lee commented 5 years ago

@DecentMakeover Hi, I have reproduced detection only. So, I can't answer your question sorry.

DecentMakeover commented 5 years ago

@bado-lee i wanted to compare how textboxes compares to advanced east, Have you by any chance looked into advanced east?

MhLiao / TextBoxes_plusplus