Open VladimirKalachikhin opened 4 years ago
Did you try other pdfs?
Yes, I download these files: http://aif.centre-mersenne.org/article/AIF_1970__20_1_493_0.pdf ,AIF_1970_493_498.pdf http://aif.centre-mersenne.org/article/AIF_1999__49_2_375_0.pdf ,AIF_1999_375_404.pdf http://www.numdam.org/article/ASENS_1970_4_3_3_273_0.pdf ,ASENS_1970_273_284.pdf http://www.numdam.org/article/ASENS_1997_4_30_3_367_0.pdf ,ASENS_1997_367_384.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC323452/pdf/pnas00314-0027.pdf ,Borcherds86.pdf http://www.numdam.org/article/BSMF_1970__98__165_0.pdf ,BSMF_1970_165_192.pdf http://www.numdam.org/article/BSMF_1998__126_2_245_0.pdf ,BSMF_1998_245_271.pdf http://people.virginia.edu/~lls2l/finite_dimensional.pdf ,Cline88.pdf
Other files are unavailable.
Only for Borcherds86.pdf and Cline88.pdf bounding boxes are placed on math regions correctly. For other files bounding boxes are fully displaced.
Dear sir,
I got the same errors too, There are 9 pdf files displaced. They are
AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284,
Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271,
InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292.
Others are match well with the label.
The fellow is AIF_1999_375_404.pdf 1.png
Which version of pdf2image are you using?
I think I used the following version -
Name: pdf2image Version: 1.5.4
many PDF link are not aviliable. who has a package of all pdf files? can you share a link by GoogleDriver or BaiDu or something else? Thanks.
The answer to questions: https://github.com/VladimirKalachikhin/marmot-to-ICDAR
i got the same problem on AIF_1999_375_404.pdf @2.png!! with pdf2image-version==1.5.4@MaliParag
Hi @VladimirKalachikhin , I have the same problem as you. I found that some images do not match their corresponding GT. Have you solved this problem now? Thank you!
Hi @MaliParag ,
Could you please share your image dataset with us? I found that different download channels and different versions of the pdf2png conversion tool may cause the image to not match GT. So, it would be very grateful to us if you share your data set with us.
Have you solved this problem now?
I used MARMOT dataset, see above.
Have you solved this problem now?
I used MARMOT dataset, see above.
Hi @VladimirKalachikhin , Can this data be converted to be the same as TDF-ICDAR2019? Or is it just that the format can be kept consistent, but the content is not consistent? Thanks!
I don't quite understand you. MARMOT just another one dataset. I created a simple tool to convert MARMOT to IDCAR-compatible format for use IDCAR instruments.
I don't quite understand you. MARMOT just another one dataset. I created a simple tool to convert MARMOT to IDCAR-compatible format for use IDCAR instruments.
Thank you for your reply. I have understand your mean.
Dear sir, I got the same errors too, There are 9 pdf files displaced. They are AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284, Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271, InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292. Others are match well with the label. The fellow is AIF_1999_375_404.pdf 1.png
could you share me all image datasets that you created, thank you very much !
Dear sir, I got the same errors too, There are 9 pdf files displaced. They are AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284, Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271, InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292. Others are match well with the label. The fellow is AIF_1999_375_404.pdf 1.png
could you share me all image datasets that you created, thank you very much !
I get the data from this. https://github.com/MaliParag/TFD-ICDAR2019#download-instructions
The download link file.
NOTE: If you find the bounding boxes are displaced from math regions, it is because the document image that you have rendered is of different size than the one used while annotating. datasetV2 provides file sizes for each image. Resize the image that you have rendered to the size provided in datasetV2 and you should be able to use the annotations.
datasetV2 provides file sizes for each image.
I know.
Yes, I rendered the image to sizes from
file_sizes
file. But bounding boxes are fully displaced.I see that pages numeration on math_gt .csv files start from 0, but
convert_pdf_to_image.py
created pages from 1. Also,convert_pdf_to_image.py
creates images different them infile_sizes
sizes.I make my own
convert_pdf_to_image
, and rending images correct sizes. I start numeration from 0 or 1. Nothing happened.I tried http://aif.centre-mersenne.org/article/AIF_1970__20_1_493_0.pdf as AIF_1970_493_498.pdf