-
Hello !
I'm trying to extract text from pdfs using poppler/pdfbox/... but all of then can not manage embedded fonts without a cmap.
When I open those embedded fonts with fontforge I can see the subset…
-
We could try to extract meta-data from PDFs automatically. There are some tools for that:
- http://www.dlib.org/dlib/july12/kern/07kern.html
- http://knowminer.know-center.tugraz.at/team-beam-meta-dat…
mitar updated
9 years ago
-
- What is your OS and architecture?
```
Debian x86_64
```
- What is your Java version (`java --version`)?
```
java 17.0.11 2024-04-16 LTS
Java(TM) SE Runtime Environment (build 17.0.11+7-LT…
-
Seems to have stopped working from the 2020-03-27 10am release forward
-
Hi folks! Love Verba, does the project support or plan to support pluggable retrievers? We are building an open-source reliable extraction and embedding engine - https://getindexify.ai We are pan on s…
-
**Describe the bug**
I am evaluating the UnstructuredClient for processing PDF documents and am encountering an issue with the Greek language text extraction. When I attempt to extract text from PDF …
-
```
#!/bin/bash
# WF 2020-06-10
# get text from pdf
which pdftotext > /dev/null
if [ $? -ne 0 ]
then
echo "you might want to install pdf2text e.g. with sudo apt-get install poppler-utils" 1>&…
-
## User story
As a dev, I want to make it easier to re-populate the database with PDF extracts so that we don't have to re-process PDFs every time we clear, migrate, or otherwise change the databas…
-
### Description of the bug
Document.select() is not working in some particular kind of pdf files.
I want to extract text from pdf files. If pdf has >30 pages then I extract first 30 pages from the…
-
I am using this code for the extraction of the images from PDF, It's working fine on some images but for some images it's changing the colors of the image.
Like for example I have a images which have…