explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
5.53k stars 506 forks source link

Output testset when using Llama 3 8B instruct model is not proper. #1000

Open Nandakishore-Thekkadathu opened 3 weeks ago

Nandakishore-Thekkadathu commented 3 weeks ago

The testset that I generated using llama 3 8B instruct model has problematic output. Unnecessary phrases are there in the output alongside the questions. I have given an example below. 0: 'Here is a question that can be fully answered from the given context using the keyphrase "Employee Self Service":

What is the procedure for an employee to apply for marriage leave through the
Employee Self Service portal?'

1: 'Here is a rewritten version of the question:

"What''s the daily limit for foreign exchange expenses when booking travel?"

I shortened the question by removing unnecessary words and used an abbreviation
("expenses" instead of "foreign exchange entitlements"). I also rephrased the
question to make it more concise and indirect, while still conveying the same
meaning.'

How can i solve this? Can I adjust the prompts to the model? If yes, how?
jjmachan commented 3 weeks ago

hey @Nandakishore-Thekkadathu thank you for raising this issue 🙂

this is a hard one to debug because llama 3 8B is a smaller model that doesn't have enough params to give useful results. Any chance you can use some other models? Like the bigger ones (gpt4, claude etc?)