pdf2txt Search Results - Githubissues

pdfminer/pdfminer.six #1038

pdf2txt.py HTML output has bad formatting

I've tried out a few of the PDFs found in the `samples` directory and most had pretty bad formatting. Is this a known limitation or did something go horribly wrong somewhere? For example, trying out `…

f-t-alves updated 1 week ago

jlownie/jl-tools #1

New features for pdf2txt

Need to add the following functionality: * Delete the file after uploading it Consider adding the following features: * Ability to download the text of a file already on Google Drive * Use a…

jlownie updated 7 months ago

pdfminer/pdfminer.six #1050

PDF miner unable to extract images for some pdfs

**Bug report** Unable to extract any images from the following PDFs . Could you please look into this? I am using the below command pdf2txt.py example.pdf --output-dir cats-and-dogs attachi…

vanshika-panwar updated 1 week ago

piotrdelikat/fet-data-extractor #4

Benchmark

To compare different pipelines (LLMs, pdf2img, pdf2txt) we need a benchmark. ## 1. Choose a sub-set of datasheets of each manufacturers * consider special PDFs that need OCR * scrambled text #…

fl4p updated 1 month ago

euske/pdfminer #74

Pdf2txt xml coordinates

I can't relate the pdf2txt xml coordinates with any of the unit of mesurement i tried. Pixels, centimeters, milimeters, points... Any of them has sense... Which unit is pdf2txt using? Thank you very …

MiguelCanto updated 10 years ago

pdfminer/pdfminer.six #1032

Crash on non-ASCII input.

# Description Crash on non-ASCII input: `UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0: ordinal not in range(128)` # Steps to reproduce the bug To make it easier, thi…

vk2diy updated 3 months ago

shmublu/pdf2txt #1

Handling Nested Lists

Hi Shmuel. I'm Yoshiki, Scott's postdoc. Is it possible for your tool to handle nested lists? Here's an example: [example.pdf](https://github.com/user-attachments/files/16511184/example.pdf) …

YoshikiTakashima updated 3 months ago

pdfminer/pdfminer.six #660

Incorrect installation/usage instructions?

**Bug report** How to use section says to run the script [like this](https://github.com/pdfminer/pdfminer.six/blob/develop/README.md#how-to-use): `python pdf2txt.py ...`. However after installing i…

Bouke updated 6 months ago

FlagOpen/FlagData #16

使用pdf2txt.py输入一份中文pdf，输出的内容为一系列英文字符，不是想要的中文内容，要怎么处理？

使用pdf2txt.py跑的pdf内容为中文，结果出来的结果如下： ![image](https://github.com/FlagOpen/FlagData/assets/30501264/ea6a8caf-a67b-4766-8ff7-92226c5c4d67)

LilySys updated 3 months ago

isuruwa/PDF-TOOLBOX #3

Add pdfminer to requirements - and bug report.

# Add pdfminer to requirements. - In requirements.txt pdfminer isn't listed. # When running PDF-to-Text option I get ```sh TypeError: TextConverter.__init__() got an unexpected keyword argument …

Theblackcat98 updated 4 months ago

360 results for pdf2txt

360 results
for pdf2txt