explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.6k stars 648 forks source link

Support multi language testset generation #439

Closed grauvictor closed 7 months ago

grauvictor commented 8 months ago

It doesn't seem that testset generation supports several types of language.

One solution would be to use an adapt function to translate the generation scripts in the same way as is done for metrics: https://github.com/explodinggradients/ragas/blob/27e48b021a1cb25a04920924e33509fe758fa87b/src/ragas/metrics/_faithfulness.py#L203-L212

jjmachan commented 8 months ago

yes, we are going to bring that same functionality to testset generation too - I'm glad you brought it up 🙂. we also had a few ideas in our mind about this as well, could we run them by you sometime your free?

grauvictor commented 8 months ago

yes, we are going to bring that same functionality to testset generation too - I'm glad you brought it up 🙂. we also had a few ideas in our mind about this as well, could we run them by you sometime your free?

Thx, of course :)

Gr33nLight commented 7 months ago

Hello! I stumbled into the same issue, are there any updates on this, maybe a workaround? Thanks!! @jjmachan @grauvictor
I'm specifically using the generate_with_langchain_docs to generate initial test data.

drdsgvo commented 7 months ago

I would also be interested in a multi-language feature, as I am working with German texts. Is there any update on this issue?

shahules786 commented 7 months ago

Hey guys, Yes now this is possible with language adaptation for test generation. Just follow the guide here and enjoy :) Install ragas from the source before doing so. @Gr33nLight @drdsgvo @grauvictor

drdsgvo commented 7 months ago

Thank you a lot for your quick and helpful answer. I tried it out with German language and German wikipedia articles. From 6 answers generated, 3 are not usable:

1 answer to a generated conditional question is empty 1 answer to a generated conditional question is (in English!): "Sorry, I cannot translate a negative number. Please provide a valid input" 1 answer to a generated reasoning question is (in English!): "Sorry I cannot answer this question as the information provided in the context does not mention... < here comes a question-specific problem>"

The other 3 answers are in German and as expected. All questions are good.

But with 3 out of 6 answers generated being no-answers, the approach is not feasible as of now. Any ideas?

shahules786 commented 7 months ago

Hey @drdsgvo , which models are you using?

drdsgvo commented 7 months ago

Hey @drdsgvo , which models are you using?

Just the default ones. I did not change anything and copy pasted your example code from your guide with some minor adaptations (like changing the language to 'german' insteadt of 'hindi')

shahules786 commented 7 months ago

Thanks, bro @drdsgvo , I will work on it. Would also love to chat with you to understand your application for it, if you're free sometime this week or later. calendly

drdsgvo commented 7 months ago

Thanks, bro @drdsgvo , I will work on it. Would also love to chat with you to understand your application for it, if you're free sometime this week or later. calendly

With the synthetic questions and answers generated we want to see if we can train an llm from scratch for question answering with given context.

shahules786 commented 7 months ago

Hey, @drdsgvo just raised a PR for #599 fixing some evolution flows which solves this problem (90% filling rate). Here is a sample of Spanish data I generated from Wikipedia

Screenshot 2024-02-14 at 4 31 10 PM
drdsgvo commented 7 months ago

Hey, @drdsgvo just raised a PR for #599 fixing some evolution flows which solves this problem (90% filling rate). Here is a sample of Spanish data I generated from Wikipedia <

Thank you very much for your quick response and coding. I tried it (updated to latest ragas version, no modifications in my code). An exception is raised in line 75 of https://github.com/explodinggradients/ragas/commit/fe0bcc497ec34170efb7a5097e26e651d14cd2a2

Exception message: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

c-yeh commented 7 months ago

Facing same problem with @drdsgvo on Japanese docs.

shahules786 commented 7 months ago

Hey @c-yeh @drdsgvo sorry guys, that was poor testing on my part. I have merged a fix for it. Can you guys please install it from the source and try again? We will make a release later this week.

if you have any concerns or ideas on how to improve this feature, feel free to bug me here

c-yeh commented 7 months ago

@shahules786 Thanks for the quick response.

Using 0.1.2.dev8+gc18c7f4 the generator.generate_with_langchain_docs() part seems to proceed, but it's been 30 min now and neither does it error out nor finish. Is it supposed to take this long for test_size=10 (~1k docs in total)?

testset = generator.generate_with_langchain_docs(
    documents=lc_docs, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
shahules786 commented 7 months ago

@c-yeh can you decrease the number of docs to (<100) and try again? I would recommend to gradually increase the number of docs and test size once you have got some results

c-yeh commented 7 months ago

@shahules786

can you decrease the number of docs to (<100) and try again?

Tried with 50 docs and it generator.generate_with_langchain_docs() finished in ~4min. Edit: up to 200 docs finished in ~4min 500 docs + test_size=10 = 10min 500 docs + test_size=20 = 12min

So at least it seems to not error out like before. However the time growth appears to be non linear.

I would recommend to gradually increase the number of docs and test size

Do you mean there is possibly a point over which the generation time starts to explode? I think normally we'd expect linear time increase wrt. number of test_size (and number of docs to a limit), right?

What is a reasonable/normal time for the generation of test_size=10?

c-yeh commented 7 months ago

I found the reason for the slowness above and have raised an issue: #642