Open andycaho opened 1 year ago
In file ./scripts/prepdocs.py
, you should change the function create_search_index()
to create an indexer to search in Chinese or other languages. By default, the language is set to English:
SearchableField(name="content", type="Edm.String", analyzer_name="en.microsoft"),
I suggest changing the code and setting analyzer_name="standard.lucene"
, which seems to work properly for common languages. For more information on the available languages, refer to the docs: https://learn.microsoft.com/en-us/python/api/azure-search-documents/azure.search.documents.indexes.models.searchfield?view=azure-python
Hope this works for you.
You can try modifying the prompt of app\backend\approvals\chateadretrieveread.py by changing "If the question is not in English, translate the question to English before generating the search query." to "Please search in the language of the original input of the question, never try to translate it into English."
If the question is not in English, translate the question to English before generating the search query.
Would be great if this was not part of the default template. Many customers that struggle with Azure OpenAI on your Data and non English documents will have a look at this accelerator.
Hm, good point. We could alter the prompt based off the query_language parameter? That presumably would reflect the language of the documents in the search index.
I can also flesh out our section about query_language into a whole doc, and suggest tweaking this part of the prompt.
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
Mention any other details that might be useful
I have tryied to build my own index and run indexer with Chinese Analyzer but it's not working. (Already set the env variable AZURE_SEARCH_INDEX to the new one). The default content analyzer in gptkbindex is English.