[FEAT] Vision support in PDF parsing

Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.

https://anythingllm.com

MIT License

27.89k stars 2.83k forks source link

[FEAT] Vision support in PDF parsing #510

Open huicewang opened 11 months ago

huicewang commented 11 months ago

If the knowledge base consists of image materials or PDF files containing image information, is it currently not supported? Will there be support for OCR recognition technology for images in the future? If so, when can this be expected to be implemented? Thank you.

timothycarambat commented 11 months ago

We currently do not have a timeline or plan to implement full OCR scanning of PDFs with text + images unless it can be done agnostic of the model selected by the instance. Not all model providers support a form of vision and even then, we do not use a LLM for parsing of PDF text.

Will mark as feature request

fmg-tomdifulvio commented 4 months ago

Could allow insertion of API key for Azure Document Intelligence? Or even to GPT4o...

oatmealm commented 3 months ago

I'm using Tika for that. Openwebui now supports it officially for text extraction. Latest version docker image installed with the libraries required to work with PDFs containing images.