amscotti / local-LLM-with-RAG

Running local Language Language Models (LLM) to perform Retrieval-Augmented Generation (RAG)
MIT License
171 stars 28 forks source link

Get the context data #12

Open carinnanunest opened 2 months ago

carinnanunest commented 2 months ago

Hello,

Thanks for sharing your work!

I am conducting some experiments with your code and trying to evaluate hallucinations and other metrics. However, I have tried everything to get the {context} to be returned at result = final_chain.stream({"question": question, "memory": memory, "context": ??}). This way, it would be possible to make such evaluations, especially since I am using LangSmith.

Do you have any idea that could help me? Thank you very much and I sorry if this is inpleasant.

amscotti commented 2 months ago

Typically, I would be more than happy to help you experiment with the code to get what you need. However, this week is extremely busy for me, so I apologize for not being able to help further. Nevertheless, here are some ideas (which could be completely wrong):

The UI and CLI use different functionality. The UI is more focused on streaming, while the CLI also streams but continues to produce the final results. You can see this here: https://github.com/amscotti/local-LLM-with-RAG/blob/main/llm.py#L129

I believe the items retrieved from the vector database are in the results. Again, I'm not 100% sure without opening up the project, but I do know that at one point, I had the documents displayed in the results.

Another approach you could take is to add your chain after the documents are retrieved, record the context, and then pass the information on to the rest of the chain.

I'm not sure of the scope of your project or what you're trying to find out, but I can tell you with certainty that even with context from RAG, LLMs are still able to hallucinate. Also, the RAG implementation in this project is quite arbitrary; there are better approaches discussed online to help decrease hallucinations. If you are just trying to gather data, and this particular code does not matter, maybe a project like https://www.promptfoo.dev/ or https://www.trulens.org/ would be better at evaluating the return from an LLM.

If not, at some point next week, I can dig into this a bit more and provide code examples if you still can't get the context.