-
Recently I ran into a particular kind of pdf file from which I cannot extract text because the library throws an exception.
## Environment
Which environment were you using when you encountered t…
-
When i parse PDF files, i want to skip Tables and Images in PDF, because they may disrupt paragraph structure
## Environment
```
$ python -m platform
Linux-5.15.0-69-generic-x86_64-with-debian-b…
-
Trying to run the tests when samples are not available
## Environment
Which environment were you using when you encountered the problem?
```bash
$ python -m platform
Linux-5.10.0-27-amd64-x…
-
I have a PDF with an indexed color image of the company logo on every page. See attached for a similar document using an indexed version of the PDFsharp logo and which reproduces the issue ([indexed-c…
-
I have just installed the package and tried uploading a file using form data, passing the file to PdfReader gives an error,
> path should be string, bytes, os.PathLike or integer, not FileStorage"…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I want to query the RAG system using multiple questions/queries concurrently.
I am us…
-
Using parenthesis causes "content stream" to be displayed
## Environment
```bash
$ python -m platform
Linux-6.2.0-34-generic-x86_64-with-glibc2.35
$ python -c "import pypdf;print(pypdf._d…
-
**Describe the bug**
For certain documents putting a visible signature in a page which has a /Contents with an array where one or more of the objects within it has the same number as an object in an …
-
Hello,
to have a more accurate retriever, i need to add some information in meta data (in my case title of document and subject).
to do that i propose to add the method :
```
# Add title and…
-
I am new to Python and I am developing a program that takes a PDF file as input and converts it into text, I am using Python3 and tried both (PyPDF2 and PDFMiner.six) packages. for first pdf file it …