WZBSocialScienceCenter / pdftabextract

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/
Apache License 2.0
2.2k stars 370 forks source link

Data Sources #19

Open speakstone opened 4 years ago

speakstone commented 4 years ago

Hello, my graduation thesis is also related to document image recognition. Can you give me your data source?

internaut commented 4 years ago

I have no idea what you mean by "data source". If you're referring to the examples, no, I can't provide anything else but the sample of scanned pages to you.