Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
6.16k stars 4.18k forks source link

Is there a way to search document language other than English? #48

Open andycaho opened 1 year ago

andycaho commented 1 year ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I have upload some Traditional Chinese document file and follow the steps to use prepdocs to parse the document, but when I asked related question in Chinese related to the document, it cannot answer any of it.

Any log messages given by the failure

Expected/desired behavior

It can answer questions regardless of the document language.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Windows 11

Versions

Mention any other details that might be useful

I have tryied to build my own index and run indexer with Chinese Analyzer but it's not working. (Already set the env variable AZURE_SEARCH_INDEX to the new one). The default content analyzer in gptkbindex is English.


Thanks! We'll be in touch soon.

gonzalorecio commented 1 year ago

In file ./scripts/prepdocs.py, you should change the function create_search_index() to create an indexer to search in Chinese or other languages. By default, the language is set to English:

SearchableField(name="content", type="Edm.String", analyzer_name="en.microsoft"),

I suggest changing the code and setting analyzer_name="standard.lucene", which seems to work properly for common languages. For more information on the available languages, refer to the docs: https://learn.microsoft.com/en-us/python/api/azure-search-documents/azure.search.documents.indexes.models.searchfield?view=azure-python

Hope this works for you.

XunLi-Nick commented 1 year ago

You can try modifying the prompt of app\backend\approvals\chateadretrieveread.py by changing "If the question is not in English, translate the question to English before generating the search query." to "Please search in the language of the original input of the question, never try to translate it into English."

iMicknl commented 11 months ago

If the question is not in English, translate the question to English before generating the search query.

Would be great if this was not part of the default template. Many customers that struggle with Azure OpenAI on your Data and non English documents will have a look at this accelerator.

pamelafox commented 11 months ago

Hm, good point. We could alter the prompt based off the query_language parameter? That presumably would reflect the language of the documents in the search index.

I can also flesh out our section about query_language into a whole doc, and suggest tweaking this part of the prompt.