Closed Said-Apollo closed 5 days ago
Not yet, but that's definitely something I'd like to add support for. I'm curious to hear more about your use case. Would you need to be able to search over a combination of text documents and images? Or would the images be embedded in the documents?
Currently we have the use case where we have a bunch of scraped webpages(HTML), each saved in a txt file. Sometimes These files also contain urls of Images. When running our RAG system, it then returns the most similar chunks (based on the html content, which not only contains image-urls, but also descriptions on e.g how to use a program)
For now it works pretty good. For the next phase however it might be better to also be able to extend this to PDF files or docx files.
Lets now say we have a pdf containing text and images. If the system detects the Image, it could also take the text before and after (+description) and chunks it together. However, it also needs to somehow preserve the Image, If the Rag system retrieves this chunk.
Ok yea that's pretty much how I'm thinking about it. You identify the images in the document and save those as image files somewhere so they can be returned in the search results when needed. And for the purposes of embedding and reranking you use an LLM-generated description of the image instead of the image itself. Should work basically the same for any file type.
I think the biggest challenge will just be reliably extracting images from a wide variety of file types.
Closed by #67
Does this RAG system also support returning images? For example if I would like to use it as a "support chatbot" which explains users step by step (with some example images) on how to use a software product.