WZBSocialScienceCenter pdftabextract issues

WZBSocialScienceCenter / pdftabextract

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/

Apache License 2.0

2.21k stars 369 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

dependencies version conflict

#26 uapdhyaybipul opened 4 months ago
1
Fix typo in documentation

#25 stweil closed 2 years ago
1
Code is Running. However not detecting any data.

#24 gitsnehasish opened 3 years ago
0
docs: fix simple typo, specifiy -> specify

#23 timgates42 closed 2 years ago
0
pdftohtml not generating image tag in XML file

#22 tonivss closed 3 years ago
1
`Poppler` installation on windows

#21 aiern closed 4 years ago
1
Logger file missing

#20 pcakhilnadh closed 4 years ago
1
Data Sources

#19 speakstone opened 5 years ago
1
poppler pdftohtml my pdf, the pictures of outputs are not right, my system is win7 and the pdf is Chinese

#18 xiamaozi11 closed 5 years ago
1
request feature for compatible with google vision api (document_text_detection)

#17 phoneee opened 5 years ago
0
Do not use `len(SEQUENCE)` to determine if a sequence is empty

#16 SivaAccionLabs opened 6 years ago
0
Please help me.

#15 eaglecoder1023 closed 6 years ago
1
Output is not coming

#14 shresthpaul133 closed 6 years ago
0
Problem in running script

#13 abhit17 closed 6 years ago
1
Question

#12 monkeydust closed 6 years ago
1
Not able to create vertical lines and recognize clusters

#11 skadambala closed 6 years ago
3
pdftohtml is not generating image files for the given pdf file.

#10 skadambala closed 6 years ago
0
pdftabextract does not label an text boxes

#9 jenniferzhu closed 6 years ago
5
pdftohtml -c -hidden -xml input.pdf output.xml

#8 jenniferzhu closed 6 years ago
1
Use pdftabextract convert pdf which is converted by a picture

#7 CapitaineNemo closed 6 years ago
1
pdftohtml does not create any scanned page with formats png and jpg

#6 salmansamie closed 6 years ago
1
Tagged releases and changelog

#5 baimafeima closed 6 years ago
0
Fix some typos (found by codespell)

#4 stweil closed 7 years ago
0
No text boxes in the output

#3 aborruso closed 7 years ago
1
jpeg8.dll does not exist

#2 aborruso closed 7 years ago
1
A question

#1 gjhkael closed 7 years ago
1