issues
search
bitextor
/
pdf-extract
PDF parser and converter to HTML
GNU General Public License v3.0
83
stars
14
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump commons-io:commons-io from 2.8.0 to 2.14.0
#67
dependabot[bot]
opened
1 month ago
0
Install on mac
#66
EmmaYHlee
opened
7 months ago
0
Bump itextpdf from 5.5.13.1 to 5.5.13.3
#65
dependabot[bot]
closed
1 year ago
0
incompatible version protobuf
#64
avesgit
opened
2 years ago
1
protobuf does not contain autogen.sh
#63
avesgit
opened
2 years ago
1
Paracrawl Sentence Join tool does not exist
#62
Wikidepia
closed
3 years ago
1
Bump bcprov-jdk15on from 1.64 to 1.67
#61
dependabot[bot]
closed
2 years ago
0
Install for Win10
#60
flywire
opened
3 years ago
1
#57 PDFExtract is printing a lot of lines that just say "null"
#59
saito1972
closed
2 years ago
1
Bump commons-io from 2.6 to 2.7
#58
dependabot[bot]
closed
3 years ago
0
Sentence join fails when using a batch file
#57
zuny26
closed
3 years ago
5
Over 2M of trash files produced while crawling
#56
mbanon
closed
3 years ago
4
java.lang.Exception: This binary file contains trie with quantization and array-compressed pointers.
#55
lpla
closed
4 years ago
9
Show warning if "sentencejoin_model" path or a used file is missing
#54
lpla
closed
4 years ago
1
Installation instruction not working
#53
zuny26
closed
4 years ago
1
Bump commons-beanutils from 1.8.3 to 1.9.4
#52
dependabot[bot]
closed
4 years ago
0
Spurious warnings about sentenceJoin models
#51
Proyag
closed
4 years ago
0
Bad redirection of kenlm stderr
#50
Proyag
closed
4 years ago
2
Fix reading threshold as float from file
#49
Proyag
closed
4 years ago
0
Bump commons-collections from 3.2 to 3.2.2
#48
dependabot[bot]
closed
4 years ago
0
Bump commons-beanutils from 1.8.3 to 1.9.2
#47
dependabot[bot]
closed
4 years ago
1
catch and throw again anti-pattern
#46
kpu
closed
4 years ago
0
Race condition collecting output
#45
kpu
closed
4 years ago
0
mostHeight set but not read
#44
kpu
closed
4 years ago
0
Poppler rewrite
#43
MickeyVI
closed
4 years ago
0
Use stdin for pdftohtml
#42
kpu
closed
4 years ago
5
Link pattern does not allow nested tags
#41
kpu
closed
4 years ago
0
fontweight set but not read
#40
kpu
closed
4 years ago
0
Why can't bold have tags inside it?
#39
kpu
closed
4 years ago
2
Deadlock on stderr from pdftohtml
#38
kpu
closed
4 years ago
0
Sorting to get the maximum?
#37
kpu
closed
4 years ago
0
Deadlock if SentenceJoin writes to stderr
#36
kpu
closed
4 years ago
1
Semantics of LOADING state
#35
kpu
closed
4 years ago
0
Error handling
#34
kpu
closed
4 years ago
0
missing script and model as specified in config file
#33
amirkamran
closed
4 years ago
1
Added timeout option and removed poppler output file
#32
lpla
closed
4 years ago
0
Poppler rewrite
#31
lpla
closed
4 years ago
0
pdf-extract timeout option
#30
lpla
closed
4 years ago
3
PDFExtract.json should be passed as an argument
#29
amirkamran
closed
4 years ago
1
Restructure of folders and adding compile action
#28
amirkamran
closed
4 years ago
1
Make dependency installation optional in poppler-rewrite setup.sh
#27
lpla
closed
4 years ago
3
Branch poppler-rewrite does mark all sentences as lang="en" if protobuf not found
#26
lpla
closed
4 years ago
3
Branch poppler-rewrite does not extract any text
#25
lpla
closed
4 years ago
4
Run on CSD3
#24
kpu
closed
4 years ago
3
Revert "Poppler rewrite"
#23
lpla
closed
4 years ago
0
Exception in thread "main" java.lang.UnsatisfiedLinkError: /tmp/native-forcld3-350533629840224/libforcld3.so: libprotobuf.so.9: cannot open shared object file: No such file or directory
#22
lpla
closed
4 years ago
9
Document new dependencies
#21
lpla
closed
4 years ago
1
Poppler rewrite
#20
kpu
closed
4 years ago
0
Exception always
#19
msdobrescu
closed
4 years ago
3
pdf-extract in warc2htmlwarc uses >1 processor
#17
wwaites
closed
4 years ago
5
Next