Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.86k stars 1.85k forks source link

Can I get pronunciation score using the sdk version? #618

Closed ziadhassan7 closed 4 years ago

ziadhassan7 commented 4 years ago

If I want to get the pronScore - fluency - accuracy should I use the Rest-API or can I get it using the sdk version?

I'm using it in android and I was able to get the confidence score using this:

SpeechConfig config = SpeechConfig.fromSubscription(speechSubscriptionKey, serviceRegion);

config.setServiceProperty("wordLevelConfidence","true", ServicePropertyChannel.UriQueryParameter);
config.setServiceProperty("format", "detailed", ServicePropertyChannel.UriQueryParameter); 

But how can I get the other information like word pronunciation and ext.

amitkumarshukla commented 4 years ago

@ziadhassan7 Thanks a lot for your query. We will get back to you asap.

amitkumarshukla commented 4 years ago

@ziadhassan7 The feature is under development. If you can explain your use case and business we can plan to prioritize it higher.

ziadhassan7 commented 4 years ago

@amitkumarshukla I'm working on something to train and develop the speaking skill to non-English natives; so, it would be great if I could show the user how much accurate they pronounced each word.

I've been also trying to do that on the REST-API using Postman to check if I could do it there, and to figure out what parameter should I pass to the url; So, maybe I could do that on the sdk version as well, but it didn't work with me neither!!

I should pass pronunciationScoreParams as the parameter and a JSON Base64 encoded text as the value of the parameter right? Or should I convert the json first to UTF8?

I tried both and nothing worked for me and I used those websites: to Base64 - to UTF8

I also used the JSON example from documentations without spaces:

{"ReferenceText":"Good morning.","GradingSystem":"HundredMark","Granularity":"FullText","Dimension":"Comprehensive"}
yinhew commented 4 years ago

@ziadhassan7 do you mind sharing your code to call the REST API, so that I can check where the issue locates? Also, can you give an estimation how much daily traffic (usage) you will have on your product? This can help us judge the priority of the SDK support.

ziadhassan7 commented 4 years ago

@yinhew I haven't written the code yet, i'm using a software called Postman to test the API first.

I converted this JSON {"ReferenceText":"Good morning."} to this UTF-8 code: \x7b\x22\x52\x65\x66\x65\x72\x65\x6e\x63\x65\x54\x65\x78\x74\x22\x3a\x22\x47\x6f\x6f\x64\x20\x6d\x6f\x72\x6e\x69\x6e\x67\x2e\x22\x7d and then from UTF-8 to Base64:

XHg3Ylx4MjJceDUyXHg2NVx4NjZceDY1XHg3Mlx4NjVceDZlXHg2M1x4NjVceDU0XHg2NVx4Nzhc
eDc0XHgyMlx4M2FceDIyXHg0N1x4NmZceDZmXHg2NFx4MjBceDZkXHg2Zlx4NzJceDZlXHg2OVx4
NmVceDY3XHgyZVx4MjJceDdk

So now, the url looks like that: https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US&Ocp-Apim-Subscription-Key=[API-KEY]&Content-Type=audio/wav; codecs=audio/pcm; samplerate=16000&pronunciationScoreParams=XHg3Ylx4MjJceDUyXHg2NVx4NjZceDY1XHg3Mlx4NjVceDZlXHg2M1x4NjVceDU0XHg2NVx4Nzhc eDc0XHgyMlx4M2FceDIyXHg0N1x4NmZceDZmXHg2NFx4MjBceDZkXHg2Zlx4NzJceDZlXHg2OVx4 NmVceDY3XHgyZVx4MjJceDdk

And that's the result:

{
    "RecognitionStatus": "Success",
    "DisplayText": "Figure 2 shows an example of processing of speech using human ear models. He acted swiftly when his time came, taking all Miller. Last year the NBA playoffs. The boys humiliated themselves against Detroit control activities. It's important for the first time in 10 years research and development spending in the federal budget program. She's proprietary Macintosh graphic displays.",
    "Offset": 200000,
    "Duration": 214200000
}
yinhew commented 4 years ago

@ziadhassan7 can you replace the "conversation" in the URL into "interactive" and try again? We currently still don't support the pronunciation assessment feature on "conversation" route, and only "interactive" route is support. Sorry for the misleading in the doc. The "conversation" support should come out in the coming 1~2 weeks.

ziadhassan7 commented 4 years ago

@yinhew Ok, this is great! now it says this: Unexpected json in 'pronunciationScoreParams' parameter. Unexpected character encountered while parsing value: . Path '', line 0, position 0. Please make sure the JSON string is encoded with UTF8 and then base64.

Those are the websites that I used: UTF-8 Encoder UTF-8 to Base64 Maybe there is a setting in the website that i used wrong, any suggestion?

And also, Is there is any turnaround for using pronunciation in SDK ?

yinhew commented 4 years ago

@ziadhassan7 I looked at your base64 string. I found its generated from string "\x7b\x22\x52\x65\x66\x65\x72\x65\x6e\x63\x65\x54\x65\x78\x74\x22\x3a\x22\x47\x6f\x6f\x64\x20\x6d\x6f\x72\x6e\x69\x6e\x67\x2e\x22\x7d". This is not the right way to do the base64 encoding.

For your JSON, the right base64 should be eyJSZWZlcmVuY2VUZXh0IjoiR29vZCBtb3JuaW5nLiJ9

ziadhassan7 commented 4 years ago

Thanks, mr @yinhew ! That worked.

It's weird that is says "Please make sure the JSON string is encoded with UTF8 and then base64.", but it worked by encoding the JSON to Base64 directly. Maybe, the converter did the job by itself .

Again, Thanks a lot and I hope the SDK version support pronunciationScoreParams soon. :D

fabswt commented 2 years ago

It's unclear to me... is dimension=Comprehensive now available to the Python SDK or not?