Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.9k stars 4.04k forks source link

The App can't understand the Chinese documents #57

Open tzuhsin0329 opened 1 year ago

tzuhsin0329 commented 1 year ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [x ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

  1. build a VM(Win10) in Azure and use it to deploy the APP.
  2. simply run azd login、azd init -t azure-search-openai-demo
  3. put the Chinese documents in ./data folder.
  4. run azd up and ./script/prepdocs.ps1 to split the documents and upload to blob and insert imformation to search index.
  5. open the endpoint and start to use the app.
  6. issue:can't answer the question based on the Chinese document.

Any log messages given by the failure

No.

Expected/desired behavior

Only can read English documents and answer. Can't read Chinese documents and answer the questions.

OS and Version?

Windows 10

Versions

0eed6114d0

Mention any other details that might be useful

Try to change the analyzer in cognitive search service to Chinese, but the app still can't read Chinese documents, the search service can. I have tried the cognitive service on Azure portal.


Thanks! We'll be in touch soon.

sissimonster commented 1 year ago

I didn't do any changes but it seems mine can read both. May I know what is the respond you got when you ask the Chinese question?

XunLi-Nick commented 1 year ago

You can try modifying the prompt of app\backend\approvals\chateadretrieveread.py by changing "If the question is not in English, translate the question to English before generating the search query." to "Please search in the language of the original input of the question, never try to translate it into English."

danwalsh-ses commented 1 year ago

I added the following function to chatreadretrieveread.py def is_chinese_query(q): for char in q: category = unicodedata.category(char) if category == 'Lo' or 'CJK' in category: return True return False and then in Step 2 in the same file I changed the self.search_client.search parameters to check for chinese within the query_language and query_speller parameters. This should cover Korean and Japanese since it's checking CJK but I still need to test but it is working for Chinese documents now. However, it returns answers in English so I am updating the prompt to tell it to return in the original input language. Need to confirm this. r = self.search_client.search(q, filter=filter, query_type=QueryType.SEMANTIC, query_language="zh-cn" if is_chinese else "en-us", query_speller="lexicon" if not is_chinese else None, semantic_configuration_name="default", top=top, query_caption="extractive|highlight-false" if use_semantic_captions else None)

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.