Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
26.45k stars 2.64k forks source link

Cannot get consistent results from "Chat with document" #2254

Closed mtomas7 closed 1 month ago

mtomas7 commented 1 month ago

Setup:

So first, there is no clear distinction if drop-in docs and RAG upload docs are treated in a same manner and if drop-in docs are added to the Workspace vector DB.

Compared the drop-in doc feature with LM Studio, when dropping file in, it evaluates if there is enough memory to load whole document and if yes, it does that. Answer quality is much better. Perhaps it would be possible to implement this memory check in AnythingLLM too?

ValfarDeveloper commented 1 month ago

Having the same here, using Milvus, could be related to the model embeddings?

timothycarambat commented 1 month ago

Compared the drop-in doc feature with LM Studio, when dropping file in, it evaluates if there is enough memory to load whole document and if yes, it does that. Answer quality is much better. Perhaps it would be possible to implement this memory check in AnythingLLM too?

This is because it injects the entire document into the context window. We have RAG with a vector db built in. LMStudio does not. The reason you get better results is because the whole document is in context!

We preserve context windows by default with RAG - but you can always replicate the same behavior with document pinning. Use that if the entire document needs to be injected. Otherwise, you can mess with the controls in this doc to get better RAG results.

https://docs.anythingllm.com/llm-not-using-my-docs

mtomas7 commented 1 month ago

@timothycarambat Thank you for the clarification! I'm trying to look at this from a practical perspective - an average user will not care so much about the internal process, eg. which embedding method was used, but will really care about the quality of the answer. I'm preparing to present this for new users and I know that some of them already have a habit to drop file into conversation as it is done in ChatGPT... Would it make a sense to do this pre-loading evaluation and suggest user to use document pinning if file is not very large. This way user will act as it is already "standard way" to interact with AI model and will get good results. Or perhaps doc. pinning should be always used if document is dropped into chat window? In case of a large file or multiple files, user would be guided to create RAG instance.

Thank you!

mtomas7 commented 1 month ago

And not to forget, looks like these points could be potentially improved in the app: