Azure-Samples / ai-rag-chat-evaluator

Tools for evaluation of RAG Chat Apps using Azure AI Evaluate SDK and OpenAI
MIT License
163 stars 59 forks source link

How can I create QA pairs on another language? #35

Closed cpatrickalves closed 5 months ago

cpatrickalves commented 5 months ago

I use the qa_generator = QADataGenerator(model_config=openai_config) with Brazilian Portuguese texts, but the QA pairs generated are always in English. Do you know how I can fix that? I could not find any place where I could tweak the prompt to force outputs in Portuguese. I am using Azure OpenAI with GPT-4.

This issue is for a: (mark with an x)

- [x ] documentation issue or request
pamelafox commented 5 months ago

Unfortunately, I believe the azure-ai-generative SDK package only works well for English generation and evaluation right now, as the prompts are written in English and the GPT models tend to skew English when in doubt.

What you could do is try and use a similar prompt in Portuguese. Here's the one for long answer pairs: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-generative/azure/ai/generative/synthetic/templates/prompt_qa_long_answer.txt Here's the code that formats, calls, and extracts pairs from it: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-generative/azure/ai/generative/synthetic/qa.py#L188

You could also try forking the azure-sdk-for-python and replacing that prompt.

I'll discuss with the generative team and file an issue about non-English support.

pamelafox commented 5 months ago

Filed https://github.com/Azure/azure-sdk-for-python/issues/34099

cpatrickalves commented 5 months ago

Thanks a lot @pamelafox for your quick response.

I will check the prompts you've sent, in the meanwhile I've added another call to the model to translate the generated QA pairs to Portuguese.

pamelafox commented 5 months ago

@cpatrickalves Ah, good idea! Are you also intending to run evaluate()? The SDK team also considers that English-only, but I wonder if it might work alright anyway, given the LLM understands Portuguese. Let us know how it goes.

cpatrickalves commented 5 months ago

>> Are you also intending to run evaluate()? The SDK team also considers that English-only Oh! that's bad. But I will try and let you know

cpatrickalves commented 5 months ago

@pamelafox

I've checked the prompts used to compute the metrics (https://learn.microsoft.com/pt-br/azure/ai-studio/concepts/evaluation-metrics-built-in) and I really think they can be used with another language and not only English, mainly because we can use GPT-4 to compute them and this model works well with a lot of languages. Lastly, the user could translate the prompts to the desired language and use them outside of AzureML, despite I think that won't make any difference in the results.

I've tried PT-BR and the results seem reliable. Any thoughts?

pamelafox commented 5 months ago

I am unsurprised to hear that the results seem reasonable, as GPT models are used to receiving instructions in English but dealing with non-English content. I think it's possible that the metrics may evolve to not work as well without explicit localization, like if they start adding more steps beyond a simple prompt for example. If it's working well for you now, at least in terms of relative scores, then it seems fine to use.