issues
search
WZBSocialScienceCenter
/
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/
Apache License 2.0
2.21k
stars
369
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
dependencies version conflict
#26
uapdhyaybipul
opened
4 months ago
1
Fix typo in documentation
#25
stweil
closed
2 years ago
1
Code is Running. However not detecting any data.
#24
gitsnehasish
opened
3 years ago
0
docs: fix simple typo, specifiy -> specify
#23
timgates42
closed
2 years ago
0
pdftohtml not generating image tag in XML file
#22
tonivss
closed
3 years ago
1
`Poppler` installation on windows
#21
aiern
closed
4 years ago
1
Logger file missing
#20
pcakhilnadh
closed
4 years ago
1
Data Sources
#19
speakstone
opened
5 years ago
1
poppler pdftohtml my pdf, the pictures of outputs are not right, my system is win7 and the pdf is Chinese
#18
xiamaozi11
closed
5 years ago
1
request feature for compatible with google vision api (document_text_detection)
#17
phoneee
opened
5 years ago
0
Do not use `len(SEQUENCE)` to determine if a sequence is empty
#16
SivaAccionLabs
opened
6 years ago
0
Please help me.
#15
eaglecoder1023
closed
6 years ago
1
Output is not coming
#14
shresthpaul133
closed
6 years ago
0
Problem in running script
#13
abhit17
closed
6 years ago
1
Question
#12
monkeydust
closed
6 years ago
1
Not able to create vertical lines and recognize clusters
#11
skadambala
closed
6 years ago
3
pdftohtml is not generating image files for the given pdf file.
#10
skadambala
closed
6 years ago
0
pdftabextract does not label an text boxes
#9
jenniferzhu
closed
6 years ago
5
pdftohtml -c -hidden -xml input.pdf output.xml
#8
jenniferzhu
closed
6 years ago
1
Use pdftabextract convert pdf which is converted by a picture
#7
CapitaineNemo
closed
6 years ago
1
pdftohtml does not create any scanned page with formats png and jpg
#6
salmansamie
closed
6 years ago
1
Tagged releases and changelog
#5
baimafeima
closed
6 years ago
0
Fix some typos (found by codespell)
#4
stweil
closed
7 years ago
0
No text boxes in the output
#3
aborruso
closed
7 years ago
1
jpeg8.dll does not exist
#2
aborruso
closed
7 years ago
1
A question
#1
gjhkael
closed
7 years ago
1