Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
24.12k stars 2.42k forks source link

[FEAT] Vision support in PDF parsing #510

Open huicewang opened 9 months ago

huicewang commented 9 months ago

If the knowledge base consists of image materials or PDF files containing image information, is it currently not supported? Will there be support for OCR recognition technology for images in the future? If so, when can this be expected to be implemented? Thank you.

timothycarambat commented 9 months ago

We currently do not have a timeline or plan to implement full OCR scanning of PDFs with text + images unless it can be done agnostic of the model selected by the instance. Not all model providers support a form of vision and even then, we do not use a LLM for parsing of PDF text.

Will mark as feature request

fmg-tomdifulvio commented 2 months ago

Could allow insertion of API key for Azure Document Intelligence? Or even to GPT4o...

oatmealm commented 1 month ago

I'm using Tika for that. Openwebui now supports it officially for text extraction. Latest version docker image installed with the libraries required to work with PDFs containing images.