-
### Description
Because of an extant bug in PyPDF2 https://github.com/mstamy2/PyPDF2/issues/193 trying to read the outline for a file generated in wkhtmltopdf results in an error. This means that l…
-
Hello I'm in the process of fine-tuning a Large Language Model (LLM) for an NGO and I need to construct an instruction dataset from .pdf and .docx documents containing information in text.
The obje…
-
Hello,
Thank you for developing this code, I know it will be invaluable once I can get it working. I'm working on a Mac and trying to execute the "menextract2pdf_overwrite.sh" command after navigat…
-
Thank you Alejandro!
I got chatbot to work on my Windows 11 PC with the following requirement.txt
langchain==0.0.166
PyPDF2==3.0.1
python-dotenv==1.0.0
streamlit==1.18.1
faiss-cpu==1.7.4
alta…
-
I can see no reason for outputting intermediate files with this script. Code should be refactored into outputting straight to out.pdf.
-
One of `pypdfocr` pre-requisites is evernote.
When running `pip3 install pypdfocr` I get the following exception:
```
$ pip3 install pypdfocr
...
Collecting evernote (from pypdfocr)
Using cached ev…
-
Excellent project, thanks for making this for us folks who struggle to put these things together.
When I try to get it to read my files, I get error - PdfFileReader is deprecated and was removed i…
-
Is it possible to extract the title of a presentation and embed it in the exported PDF? I've tested a few of the generated PDFS using PyPDF2, and none of them have title metadata. I guess this may be …
-
At least with .parquet, [there are opportunities](https://wesmckinney.com/blog/python-parquet-multithreading/) to improve speed and reduce disk usage with dataframe binaries via pyarrow's built-in thr…
-
Many years ago I used pypdf to create links for a book of maps for our storm sewer system. I had an index page that had links to all of the other pages and each page had links to the page with the map…