Problem with chineses language: ignore the duplicated words

awslabs / amazon-transcribe-streaming-sdk

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.

Apache License 2.0

151 stars 38 forks source link

Problem with chineses language: ignore the duplicated words #102

Closed hoanguyen401 closed 5 months ago

hoanguyen401 commented 6 months ago

Hello, I have a problem with chinese language when streaming a local audio file: the duplicated word is ignored, for example: the content in my audio file: 验证您的码为一三验证一七二五五 there are 2 words 五 after 二. But AWS is ignored a word 五. So, the transcript result: 验证您的码为一三验证一七二五 Anyone have an idea to help me please ? Thanks

hoanguyen401 commented 5 months ago

I found the solution.