Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.56k stars 2.78k forks source link

QADataGenerator doesn't generate QA pairs in non-English languages #34099

Open pamelafox opened 8 months ago

pamelafox commented 8 months ago

Describe the bug

When we use the QADataGenerator(model_config=openai_config) with Brazilian Portuguese texts, the QA pairs generated are always in English.

I believe that is a known issue, but it needs to be documented clearly, as customers expect LLM-based tools to work in non-English languages.

To Reproduce

Pass in Portuguese text to this code:

    result = qa_generator.generate(
        text=text,
        qa_type=QAType.LONG_ANSWER,
        num_questions=2,
    )

And get English answers instead.

Expected behavior

Portuguese answers.

l0lawrence commented 8 months ago

Hi @pamelafox thank you for the feedback, forwarding your request to @azureml-github and they will get back to you asap.

diondrapeck commented 8 months ago

@pamelafox Thank you for bringing this to our attention. We will update the reference documentation to make the language limitations clear.