Azure-Samples / Cognitive-Speech-STT-Windows

Windows SDK for the Microsoft Speech-to-Text API, part of Cognitive Services
https://www.microsoft.com/cognitive-services/en-us/speech-api
Other
112 stars 89 forks source link

The demo application is not processing entire file #47

Closed mrctito closed 6 years ago

mrctito commented 6 years ago

Hi!

I modify the x86 demo to pt-BR language and it is not processing entire wav file some times.

That wav file has a slow speech.

Could you help me?

Tnahk you

zhouwangzw commented 6 years ago

Which recognition mode was used, ShortPhrase mode or LongDictation mode? The ShortPhrase mode supports an utterance up to 15 seconds long, and the LongDictation mode supports an utterance up to 2 minutes long. Please refer to the page for details.

mrctito commented 6 years ago

I am using LongDictation mode. I have already read the documentation.

The refered audio has less than 1 minute.

Thank you.

Em 18 de dez de 2017 04:43, "Zhou Wang" notifications@github.com escreveu:

Which recognition mode was used, ShortPhrase mode or LongDictation mode? The ShortPhrase mode supports an utterance up to 15 seconds long, and the LongDictation mode supports an utterance up to 2 minutes long. Please refer to the page https://docs.microsoft.com/en-us/azure/cognitive-services/speech/getstarted/getstartedcsharpdesktop for details.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows/issues/47#issuecomment-352339295, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxa1lNxtFsmwYftuMgWpfFU99FRB0dmks5tBgmSgaJpZM4REzGb .

zhouwangzw commented 6 years ago

Could you please share the log output? What is the audio codec format of the file? Does the audio contain long silence period?

mrctito commented 6 years ago

Yes, contem long silence period.

OK, I will share file info later.

Thank you so much

2017-12-19 8:03 GMT-02:00 Zhou Wang notifications@github.com:

Could you please share the log output? What is the audio codec format of the file? Does the audio contain long silence period?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows/issues/47#issuecomment-352696584, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxa1nQHhC0f671_aZaNn2v-EwiXLI13ks5tB4n7gaJpZM4REzGb .

mrctito commented 6 years ago

Hi!

I am sending the log and wav attached. Unfortunately they are in portuguese...

This is the final text: O atendimento foi bom é 1 bom prazo foram rápidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que no final prevaleceu o atendimento

Thank you!

2017-12-19 8:03 GMT-02:00 Zhou Wang notifications@github.com:

Could you please share the log output? What is the audio codec format of the file? Does the audio contain long silence period?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows/issues/47#issuecomment-352696584, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxa1nQHhC0f671_aZaNn2v-EwiXLI13ks5tB4n7gaJpZM4REzGb .

--- Start speech recognition using long wav file with LongDictation mode in pt-BR language ----

--- Partial result received by OnPartialResponseReceivedHandler() --- e

--- Partial result received by OnPartialResponseReceivedHandler() --- se

--- Partial result received by OnPartialResponseReceivedHandler() --- o

--- Partial result received by OnPartialResponseReceivedHandler() --- por

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom �

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo for a

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos e

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos no

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atender

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos no contra

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atend�-lo

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade que

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficar� o

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficar� o problema

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficar� o probleminha

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas a

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos pra

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas de m�

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas maaz

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometer

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram a

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que n�o

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho q

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que no final

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que no final do

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que no final prevaleceu

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que no final prevaleceu a

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que no final prevaleceu o

--- Partial result received by OnPartialResponseReceivedHandler() --- o atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que no final prevaleceu o entendimento

--- OnDataDictationResponseReceivedHandler --- Final n-BEST Results [0] Confidence=None, Text="O atendimento foi bom � 1 bom prazo foram r�pidos atenderam atividade ficaram uns probleminhas alguns pequenos problemas no comprometeram eu acho que no final prevaleceu o atendimento"

--- Partial result received by OnPartialResponseReceivedHandler() --- o

--- Partial result received by OnPartialResponseReceivedHandler() --- na

--- Partial result received by OnPartialResponseReceivedHandler() --- no

--- Partial result received by OnPartialResponseReceivedHandler() --- no m�dio

--- Partial result received by OnPartialResponseReceivedHandler() --- no m�dia geral

--- Partial result received by OnPartialResponseReceivedHandler() --- no m�dia geral do

--- Partial result received by OnPartialResponseReceivedHandler() --- no geral o efeito sim

--- Partial result received by OnPartialResponseReceivedHandler() --- no geral o efeito sobre

--- Partial result received by OnPartialResponseReceivedHandler() --- no geral o efeito sim obrigado

--- OnDataDictationResponseReceivedHandler --- Final n-BEST Results [0] Confidence=None, Text="No geral o efeito sim obrigado"

--- OnDataDictationResponseReceivedHandler --- No phrase response is available.

mrctito commented 6 years ago

Dear sirs,

I have several more examples, if you want I can send it to you.

Thank you.

2017-12-19 8:03 GMT-02:00 Zhou Wang notifications@github.com:

Could you please share the log output? What is the audio codec format of the file? Does the audio contain long silence period?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows/issues/47#issuecomment-352696584, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxa1nQHhC0f671_aZaNn2v-EwiXLI13ks5tB4n7gaJpZM4REzGb .

mrctito commented 6 years ago

Hi!

How are you? I hope everything is fine!

I am just doing contact to know if I sent all the requested information in a correct way.

I think the subject of my issue do not properly describe the problem, then in other words, my problem is that the transcripted text is very very different from the áudio content.

Thank you so much!

zhouwangzw commented 6 years ago

Do you mean that word error rate is high, or the text does not match the audio at all? I see the log output you attached, could you please also send us the audio file and expected transcript?

Thank you!

mrctito commented 6 years ago

Hi!

I mean that the text does not match the audio at all.

Tes I can send to you the áudio and expected transcription, but they are sensitive documents. Can I send they to a private email?

Thank you!

Marcos Tito P. Marques

De: Zhou Wang Enviado:terça-feira, 2 de janeiro de 2018 04:50 Assunto: Re: [Azure-Samples/Cognitive-Speech-STT-Windows] The demo applicationis not processing entire file (#47)

Do you mean that word error rate is high, or the text does not match the audio at all? I see the log output you attached, could you please also send us the audio file and expected transcript? Thank you! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhouwangzw commented 6 years ago

If the text does not match the audio at all, it could be that the language setting is incorrect, or the audio format is not supported. What is the format of your audio file? If you record an audio file in WAV format with PCM single channel (mono), 16 KHz encoding, will you get correct transcription result?

mrctito commented 6 years ago

Hi!

If you dont mind, I would like to send the wav file to you.

Thank you.

Marcos Tito P. Marques

De: Zhou Wang Enviado:quarta-feira, 3 de janeiro de 2018 06:40 Para: Azure-Samples/Cognitive-Speech-STT-Windows Cc:mrctito; Author Assunto: Re: [Azure-Samples/Cognitive-Speech-STT-Windows] The demo applicationis not processing entire file (#47)

If the text does not match the audio at all, it could be that the language setting is incorrect, or the audio format is not supported. What is the format of your audio file? If you record an audio file in WAV format with PCM single channel (mono), 16 KHz encoding, will you get correct transcription result? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

mrctito commented 6 years ago

Hi!

I just check the áudio file, and it is 8 KHZ.

It Works if I convert to 16 KHZ, or I need to record the source talk again?

Thank you!

Marcos Tito P. Marques

De: Zhou Wang Enviado:quarta-feira, 3 de janeiro de 2018 06:40 Para: Azure-Samples/Cognitive-Speech-STT-Windows Cc:mrctito; Author Assunto: Re: [Azure-Samples/Cognitive-Speech-STT-Windows] The demo applicationis not processing entire file (#47)

If the text does not match the audio at all, it could be that the language setting is incorrect, or the audio format is not supported. What is the format of your audio file? If you record an audio file in WAV format with PCM single channel (mono), 16 KHz encoding, will you get correct transcription result? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhouwangzw commented 6 years ago

You can convert your file to 16KHz for testing, but the audio quality might be not good, since the original sample rate is only 8KHz. If possible, it would be better to record the audio in 16KHz again.

mrctito commented 6 years ago

Ok, I will try.

Thank you so much!

Em 3 de jan de 2018 08:05, "Zhou Wang" notifications@github.com escreveu:

You can convert your file to 16KHz for testing, but the audio quality might be not good, since the original sample rate is only 8KHz. If possible, it would be better to record the audio in 16KHz again.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows/issues/47#issuecomment-354975344, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxa1jsDvAfqIv6M79kHjnGGjT7BtVflks5tG1D7gaJpZM4REzGb .

zhouwangzw commented 6 years ago

I am closing the issue. If you still see the problem, please reopen it.

mrctito commented 6 years ago

Thank you!

2018-01-11 12:51 GMT-02:00 Zhou Wang notifications@github.com:

I am closing the issue. If you still see the problem, please reopen it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows/issues/47#issuecomment-356956734, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxa1iTIVpP806-CrmBt2LVZlTRCl_5Uks5tJh_8gaJpZM4REzGb .