-
```
this is the error.log
Traceback (most recent call last):
File "./peepdf.py", line 541, in
console.cmdloop()
File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(…
-
Is there any way to send Tika a password for password protected PDFs? Based upon the Tika documentation it is supported:
https://tika.apache.org/1.6/api/org/apache/tika/parser/pdf/PDFParser.html
…
-
- PHP Version: 7.4
- PDFParser Version: 2.7
### Description:
I'm attempting to parse a document that is primarily tables. Most of the text is a jumble with white space and newlines missin…
-
In `ocd_backend.utils.file_parser` we use the python version of Apache Tika as a fallback when the mimetype is not 'application/pdf'. We use `pdfparser.poppler` as first choice since it has a native b…
-
### What version of Bun is running?
1.0.35+940448d6b
### What platform is your computer?
Darwin 23.2.0 arm64 arm
### What steps can reproduce the bug?
If I attempt to use `pdf-text-reader` (which…
-
```
this is the error.log
Traceback (most recent call last):
File "./peepdf.py", line 541, in
console.cmdloop()
File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(…
-
```
this is the error.log
Traceback (most recent call last):
File "./peepdf.py", line 541, in
console.cmdloop()
File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(…
-
**Bug report**
Description: Height of character boxes is not correct on some fonts. I removed other font and graphical items from the PDF to isolate the problematic character boxes.
![image](htt…
-
Refers to #214, but it is not the same issue
I can't replicate #214 with the same use case because I'm getting `malformed PDF file?` error
Unfortunately I can't share the file, but this is the e…
-
Es gibt einzelne PDF-Dateien, die nicht extrahiert werden könen. Bei ihnen gibt es in Tika, dem PDFParser, der von Solr verwendet wird, eine Fehlermeldung.
Wenn diese Dateien mit externen Tools val…