deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.01k stars 1.86k forks source link

Add support for llama.cpp llm evaluator #7718

Closed lbux closed 1 month ago

lbux commented 4 months ago

Is your feature request related to a problem? Please describe. As of now, Haystack's evaluators which extend LLMEvaluator only support OpenAI. I would like for support through llama.cpp to be added for local/offline/"free" evaluation.

Describe the solution you'd like After implementing LlamaCppChatGenerator (https://github.com/deepset-ai/haystack-core-integrations/pull/723), it is possible to constrain the output to json. As such, we can split up the instructions of the evaluator into ChatMessages (system for instructions/examples, user for input tuples) and then have the result be in the json format expected.

I have a WIP implementation, and I would like feedback on how to handle the integration of llama.cpp into the existing evaluators.

As of now, I am manually calling the Chat Generator in llm_evaluator to ensure it would be possible. I am hard coding the model and generation kwargs.

My two different ideas to resolve this are as follows:

Another idea but I don't think it would be ideal is to just have separate evaluator components for llama.cpp.

If I can have some feedback, I can submit a PR with revised changes in a couple of days.

Describe alternatives you've considered N/A Additional context N/A

shadeMe commented 4 months ago

Thanks for your help with this! Instead of passing generator instances, we can just expand the current approach in the following manner:

Sprizgola commented 1 month ago

Hi everyone, is there any update about this issue? I was developing an evaluation pipeline but unfortunately (due to company policy) it is still not possible without using OpenAI.

It could be cool if either an LLMGenerator could be passed as an argument (as @lbux mentioned) or let the backend instantiate the generator; I'm happy to help in any case.

shadeMe commented 1 month ago

You can use the OpenAIGenerator without querying OpenAI's models/server - You can host a local model with ollama, etc that supports OpenAI-compatible outputs/endpoints and point the LLMEvaluator to it like shown in this PR.

Sprizgola commented 1 month ago

I might have missed that PR. I've run some test locally with LLMEvaluator and ContextRelevanceEvaluator, using a model hosted on Ollama and it run without any error; thanks @shadeMe!