-
I've tried out a few of the PDFs found in the `samples` directory and most had pretty bad formatting. Is this a known limitation or did something go horribly wrong somewhere? For example, trying out `…
-
Need to add the following functionality:
* Delete the file after uploading it
Consider adding the following features:
* Ability to download the text of a file already on Google Drive
* Use a…
-
**Bug report**
Unable to extract any images from the following PDFs . Could you please look into this?
I am using the below command
pdf2txt.py example.pdf --output-dir cats-and-dogs
attachi…
-
To compare different pipelines (LLMs, pdf2img, pdf2txt) we need a benchmark.
## 1. Choose a sub-set of datasheets of each manufacturers
* consider special PDFs that need OCR
* scrambled text
#…
-
I can't relate the pdf2txt xml coordinates with any of the unit of mesurement i tried. Pixels, centimeters, milimeters, points... Any of them has sense... Which unit is pdf2txt using?
Thank you very …
-
# Description
Crash on non-ASCII input: `UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0: ordinal not in range(128)`
# Steps to reproduce the bug
To make it easier, thi…
-
Hi Shmuel.
I'm Yoshiki, Scott's postdoc.
Is it possible for your tool to handle nested lists? Here's an example: [example.pdf](https://github.com/user-attachments/files/16511184/example.pdf)
…
-
**Bug report**
How to use section says to run the script [like this](https://github.com/pdfminer/pdfminer.six/blob/develop/README.md#how-to-use): `python pdf2txt.py ...`. However after installing i…
Bouke updated
6 months ago
-
使用pdf2txt.py跑的pdf内容为中文,结果出来的结果如下:
![image](https://github.com/FlagOpen/FlagData/assets/30501264/ea6a8caf-a67b-4766-8ff7-92226c5c4d67)
-
# Add pdfminer to requirements.
- In requirements.txt pdfminer isn't listed.
# When running PDF-to-Text option I get
```sh
TypeError: TextConverter.__init__() got an unexpected keyword argument …