D-Star-AI / dsRAG

High-performance retrieval engine for unstructured data
MIT License
852 stars 61 forks source link

Does it only support text? #58

Closed Sere1nz closed 1 month ago

Sere1nz commented 1 month ago

What about table and image in pdf?

zmccormick7 commented 1 month ago

We'll be adding support for this very soon.

Sere1nz commented 1 month ago

Thank you! might also consider extract text /image from scanned pdf format by using OCR

zmccormick7 commented 4 weeks ago

The method we're working on uses a VLM to extract both text and image bounding boxes, so it should work just as well for scanned PDFs.