-
Hi,
When I upload a pdf file it gives the following error instead of creating embeddings. I also tried installing poppler by using pip command but not succeeded. I am trying this on Windows 11. Can y…
-
**Describe the bug**
When running partition on a two column pdf, text extraction puts characters is the wrong position
**To Reproduce**
[two_col.pdf](https://github.com/user-attachments/files/16037…
-
ZotFile has this. I have no idea how widely it's used, or what it's used for. Maybe people extract a ToC and then use that as a template for notes?
Is this the same (in content, not in usage) as th…
-
I'm working with Mo Hayat with WashingtonAbstract and we're hoping to both utilize and contribute to OpenStates scrapers/API/Data. I only recently began playing around with the various repos available…
-
We need to check we're doing a good enough job with what we have, and we should look at exploiting additional tools in order to improve metadata extraction from PDFs.
- [GROBID (or Grobid) means GeneR…
-
When trying to parse PDF at http://www.ada.gov/hospcombrprt.pdf, I get the following error:
```
pdfdocument.py", line 348, in _initialize_password
raise PDFEncryptionError('Unknown algorithm: par…
-
**Cause of Bug**
On Extraction of text from Pdf using different tool each of the extracted text gives cobination of "ti" as " " and "ft" as " "
**Code Snippet which is used for greneration of pdf…
-
Hello team,
I am trying to execute the demo notebook (pdf_data_extraction) and 'am getting an error while importing:
**from src.data.s3_communication import S3Communication**
ImportError: c…
-
Text extraction from the pdf's is not always 100% accurate because the gazette documents always have 2 columns of text and when they're too close to eachother sentences or words can be mixed up with t…
-
pdftotext (https://www.xpdfreader.com/pdftotext-man.html, https://pypi.org/project/pdftotext/) could be used for PDF text extraction. I don't know whether the latter wraps the former or is something e…
jorsn updated
3 years ago