MaliParag / TFD-ICDAR2019

TDF-ICDAR 2019 Dataset for Typeset Math Formula Detection
http://crohme2019.cs.rit.edu/
67 stars 18 forks source link

Bounding boxes are displaced from math regions #3

Open VladimirKalachikhin opened 4 years ago

VladimirKalachikhin commented 4 years ago

Yes, I rendered the image to sizes from file_sizes file. But bounding boxes are fully displaced. 1

I see that pages numeration on math_gt .csv files start from 0, but convert_pdf_to_image.py created pages from 1. Also, convert_pdf_to_image.py creates images different them in file_sizes sizes.

I make my own convert_pdf_to_image, and rending images correct sizes. I start numeration from 0 or 1. Nothing happened.

I tried http://aif.centre-mersenne.org/article/AIF_1970__20_1_493_0.pdf as AIF_1970_493_498.pdf

MaliParag commented 4 years ago

Did you try other pdfs?

VladimirKalachikhin commented 4 years ago

Yes, I download these files: http://aif.centre-mersenne.org/article/AIF_1970__20_1_493_0.pdf ,AIF_1970_493_498.pdf http://aif.centre-mersenne.org/article/AIF_1999__49_2_375_0.pdf ,AIF_1999_375_404.pdf http://www.numdam.org/article/ASENS_1970_4_3_3_273_0.pdf ,ASENS_1970_273_284.pdf http://www.numdam.org/article/ASENS_1997_4_30_3_367_0.pdf ,ASENS_1997_367_384.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC323452/pdf/pnas00314-0027.pdf ,Borcherds86.pdf http://www.numdam.org/article/BSMF_1970__98__165_0.pdf ,BSMF_1970_165_192.pdf http://www.numdam.org/article/BSMF_1998__126_2_245_0.pdf ,BSMF_1998_245_271.pdf http://people.virginia.edu/~lls2l/finite_dimensional.pdf ,Cline88.pdf

Other files are unavailable.

Only for Borcherds86.pdf and Cline88.pdf bounding boxes are placed on math regions correctly. For other files bounding boxes are fully displaced.

BigPandaCPU commented 4 years ago

Dear sir, I got the same errors too, There are 9 pdf files displaced. They are AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284,
Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271, InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292. Others are match well with the label. The fellow is AIF_1999_375_404.pdf 1.png 1

MaliParag commented 4 years ago

Which version of pdf2image are you using?

I think I used the following version -

Name: pdf2image Version: 1.5.4

macqueen09 commented 4 years ago

many PDF link are not aviliable. who has a package of all pdf files? can you share a link by GoogleDriver or BaiDu or something else? Thanks.

VladimirKalachikhin commented 4 years ago

The answer to questions: https://github.com/VladimirKalachikhin/marmot-to-ICDAR

humeme commented 4 years ago

i got the same problem on AIF_1999_375_404.pdf @2.png!! with pdf2image-version==1.5.4@MaliParag 222

2

Jeozhao commented 3 years ago

Hi @VladimirKalachikhin , I have the same problem as you. I found that some images do not match their corresponding GT. Have you solved this problem now? Thank you!

Jeozhao commented 3 years ago

Hi @MaliParag ,

Could you please share your image dataset with us? I found that different download channels and different versions of the pdf2png conversion tool may cause the image to not match GT. So, it would be very grateful to us if you share your data set with us.

VladimirKalachikhin commented 3 years ago

Have you solved this problem now?

I used MARMOT dataset, see above.

Jeozhao commented 3 years ago

Have you solved this problem now?

I used MARMOT dataset, see above.

Hi @VladimirKalachikhin , Can this data be converted to be the same as TDF-ICDAR2019? Or is it just that the format can be kept consistent, but the content is not consistent? Thanks!

VladimirKalachikhin commented 3 years ago

I don't quite understand you. MARMOT just another one dataset. I created a simple tool to convert MARMOT to IDCAR-compatible format for use IDCAR instruments.

Jeozhao commented 3 years ago

I don't quite understand you. MARMOT just another one dataset. I created a simple tool to convert MARMOT to IDCAR-compatible format for use IDCAR instruments.

Thank you for your reply. I have understand your mean.

ducMNSD commented 3 years ago

Dear sir, I got the same errors too, There are 9 pdf files displaced. They are AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284, Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271, InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292. Others are match well with the label. The fellow is AIF_1999_375_404.pdf 1.png 1

could you share me all image datasets that you created, thank you very much !

BigPandaCPU commented 3 years ago

Dear sir, I got the same errors too, There are 9 pdf files displaced. They are AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284, Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271, InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292. Others are match well with the label. The fellow is AIF_1999_375_404.pdf 1.png 1

could you share me all image datasets that you created, thank you very much !

I get the data from this. https://github.com/MaliParag/TFD-ICDAR2019#download-instructions QQ截图20210224093816

The download link file. 22

MingchangLi commented 3 years ago

NOTE: If you find the bounding boxes are displaced from math regions, it is because the document image that you have rendered is of different size than the one used while annotating. datasetV2 provides file sizes for each image. Resize the image that you have rendered to the size provided in datasetV2 and you should be able to use the annotations.

VladimirKalachikhin commented 3 years ago

datasetV2 provides file sizes for each image.

I know.