-
> DjVu is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. [...] DjVu has …
-
## Is your feature request related to a problem? Please describe.
When saving an image to a text file or selecting the Scan to Text File option and selecting a scanned book for text extraction using …
-
**Describe the bug**
I am evaluating the UnstructuredClient for processing PDF documents and am encountering an issue with the Greek language text extraction. When I attempt to extract text from PDF …
-
Suggested changes to layout
![Data Problem Report page-page-001.jpg](https://images.zenhubusercontent.com/5cab6cc818447254e461e741/4d8032d2-25f2-4d07-bd68-bea53f7a0384)
-
Stemming from on [an email discussion](https://groups.google.com/g/islandora-dev/c/oPr1ZsJx-HA):
Hypercube currently uses pdftotext to extract text embedded in a PDF OR tesseract to perform OCR on …
-
**Is your feature request related to a problem? Please describe.**
I've been using Google Document AI for text extraction from scanned documents, and it's been working well in terms of extractin…
-
Type: Bug
## Issue Description ##
The C# extension cannot handle code actions when there are diagnostics from the Semgrep Extension included in the request.
Hovering over a Semgrep diagnosti…
-
#### Environment details
- OS type and version: I use Windows locally, and Linux in prd
- Python version: I use the latest release of Python, which is currently Python 3.12.4
- pip version:…
-
https://oldinsurancemaps.net/resource/328 is an example of a sheet that isn't a map. Currently, this sheet would sit in the unprepared area, but that's a difference between a page not yet reviewed but…
-
Hi, I would like to understand how the current implementation handles HNSW + filtering.
Imagine you have a table:
```sql
CREATE TABLE document (
id UUID PRIMARY KEY,
text TEXT NOT NULL,
…