Picovoice / rhino

On-device Speech-to-Intent engine powered by deep learning
https://picovoice.ai/
Apache License 2.0
629 stars 82 forks source link

output confidence for inference #290

Closed shilovav3 closed 2 years ago

shilovav3 commented 2 years ago

Hi,

I am developing multilanguage intention detection using Rhino. I provide the same audio sequence to Rhino models of different languages. Unfortunately, some phrases provide double-detection on different languages. For instance, German "Wasser auf" ("Water on" in English) is sometimes detected by English Rhino model as "Water off" (opposite meaning) together with correct detection by German model. Playing with the Sensitivity parameter hasn't resolved the problem. Is it possible to compare the detections? Do you have something like "Detection probability" value that that estimates quality of detection? I program in C. Can such value be accessed via rhino pointer?

I will appreciate very much any help that you can provide.

Maybe you can extend libpv_rhino.so library with the function that can extract such value.

Thank you!

Best regards, Alex.

kenarsa commented 2 years ago

@shilovav3 thank you for reporting. do you have a wake word before rhino or is it just rhino running? if yes, you might want to switch the rhino context (language) based on the wake word triggered. this is just an idea. the false alarms in this situation are expected and I don't see an immediate fix. but I like the idea of outputting a confidence metric. I change the title and keep this open for future releases.

shilovav3 commented 2 years ago

Hi Alireza,

Thank you for the prompt reply! It is just Rhino with multi-language detection when Rhino models process incoming audio in parallel. There is no wake word. A wake word cannot be used because this works in multi-language environment. This is why I need to compare detections made by different language models. Does a model provide detection probability(quality)? How can it be accessed?

Thank you!

Best regards, Alex.

From: Alireza Kenarsari @.> Sent: Friday, January 21, 2022 12:51 PM To: Picovoice/rhino @.> Cc: shilovav3 @.>; Mention @.> Subject: [External] Re: [Picovoice/rhino] Duplicate detection by different languages (Issue #290)

@shilovav3https://urldefense.com/v3/__https:/github.com/shilovav3__;!!MvWE!WAr6OFLac2FxJcFv1XlJRed6q3CXhFD08XQhsxBLR9iw5Ciy9ixY46S5nnLh1vgPkQ$ thank you for reporting. do you have a wake word before rhino or is it just rhino running? if yes, you might want to switch the rhino context (language) based on the wake word triggered. this is just an idea.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/Picovoice/rhino/issues/290*issuecomment-1018731453__;Iw!!MvWE!WAr6OFLac2FxJcFv1XlJRed6q3CXhFD08XQhsxBLR9iw5Ciy9ixY46S5nnKfzLFRpA$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AQJCW3OTCYZCHGNGPQTIWFTUXGMIZANCNFSM5MK6NP2A__;!!MvWE!WAr6OFLac2FxJcFv1XlJRed6q3CXhFD08XQhsxBLR9iw5Ciy9ixY46S5nnJAqWLpNw$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!MvWE!WAr6OFLac2FxJcFv1XlJRed6q3CXhFD08XQhsxBLR9iw5Ciy9ixY46S5nnLovCLK8A$ or Androidhttps://urldefense.com/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!MvWE!WAr6OFLac2FxJcFv1XlJRed6q3CXhFD08XQhsxBLR9iw5Ciy9ixY46S5nnLP1wiC2Q$. You are receiving this because you were mentioned.Message ID: @.**@.>>

kenarsa commented 2 years ago

It does not at the moment. I keep this open so that we can consider it as a feature candidate for upcoming releases.

shilovav3 commented 2 years ago

Hi Alireza,

Can the modal data be accessed via “struct pv_rhino *” pointer? Usually an AI model provides detection probability.

If it cannot, could you please, give an idea when a new release with the detection probability access can be issued? It is very valuable feature.

Thank you!

Best regards, Alex.

From: Alireza Kenarsari @.> Sent: Friday, January 21, 2022 2:27 PM To: Picovoice/rhino @.> Cc: shilovav3 @.>; Mention @.> Subject: [External] Re: [Picovoice/rhino] output confidence for inference (Issue #290)

It does not at the moment. I keep this open so that we can consider it as a feature candidate for upcoming releases.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/Picovoice/rhino/issues/290*issuecomment-1018796401__;Iw!!MvWE!Qgc4icgCMfHLHHtRmS_XqUgnRxKVw09I1f_h-vOjJBxZS_scErP3E7UTLnzWYFSzpg$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AQJCW3KR6D5VCWABK5CNKYTUXGXQ3ANCNFSM5MK6NP2A__;!!MvWE!Qgc4icgCMfHLHHtRmS_XqUgnRxKVw09I1f_h-vOjJBxZS_scErP3E7UTLnyvwPyxtw$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!MvWE!Qgc4icgCMfHLHHtRmS_XqUgnRxKVw09I1f_h-vOjJBxZS_scErP3E7UTLnzijKA5Iw$ or Androidhttps://urldefense.com/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!MvWE!Qgc4icgCMfHLHHtRmS_XqUgnRxKVw09I1f_h-vOjJBxZS_scErP3E7UTLnzLkzD8pQ$. You are receiving this because you were mentioned.Message ID: @.***>

kenarsa commented 2 years ago

no. the pointer is opaque. so you cant access the internals. I cant provide a timeline at this point.

shilovav3 commented 2 years ago

Hi Alireza,

Could you, please keep me updated about this detection probability access feature and when it will be implemented? It will be very valuable feature to the Rhino.

Thank you!

Best regards, Alex.

From: Alireza Kenarsari @.> Sent: Friday, January 21, 2022 2:52 PM To: Picovoice/rhino @.> Cc: shilovav3 @.>; Mention @.> Subject: [External] Re: [Picovoice/rhino] output confidence for inference (Issue #290)

no. the pointer is opaque. so you cant access the internals. I cant provide a timeline at this point.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/Picovoice/rhino/issues/290*issuecomment-1018813050__;Iw!!MvWE!SUHP-FQRUVxuvDSIUn4ugk-TzLgUoqM0JijtTMKZZAaYMcthxYe4YWcrSYTa5hE3eA$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AQJCW3JG5RZH66KC7Q4QKBTUXG2PZANCNFSM5MK6NP2A__;!!MvWE!SUHP-FQRUVxuvDSIUn4ugk-TzLgUoqM0JijtTMKZZAaYMcthxYe4YWcrSYQYSaxrEg$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!MvWE!SUHP-FQRUVxuvDSIUn4ugk-TzLgUoqM0JijtTMKZZAaYMcthxYe4YWcrSYSgDXfQCg$ or Androidhttps://urldefense.com/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!MvWE!SUHP-FQRUVxuvDSIUn4ugk-TzLgUoqM0JijtTMKZZAaYMcthxYe4YWcrSYT6Rc4nsQ$. You are receiving this because you were mentioned.Message ID: @.***>

cmcinroy commented 2 years ago

A use case where insight into the confidence of the inference might be useful is processing utterances from individuals with challenges that might affect enunciation.

Presumably, rhino has a default confidence threshold that it uses to determine whether an utterance is understood. I'm already way out of my depth here, but would allowing adjustment of that threshold help with the success rate of processing audio in the aforementioned use case?

laves commented 2 years ago

@cmcinroy - thanks for your insight. Rhino does have a sensitivity threshold that can be set using the sensitivity parameter in any of our SDKS. A higher sensitivity value would allow you to capture more enunciation variation, but it would come at the cost of potentially decreased inference accuracy.

As for the confidence metric, we are monitoring the interest in this feature via this issue and may include it in a future release if there is enough interest.

shilovav3 commented 2 years ago

I want o express my interest in the confidence metric: I creates a program of voice recognition for multilanguage environment. Few models of different languages work in parallel. In some cases 2 models can make detections for the same phrase. I need to compare result in this case.

Playing with the sensitivity parameter doesn’t help much. It can help with one phrase (not always) and worsen the result for another.

From: Ian Lavery @.> Sent: Wednesday, February 23, 2022 4:43 PM To: Picovoice/rhino @.> Cc: shilovav3 @.>; Mention @.> Subject: [External] Re: [Picovoice/rhino] output confidence for inference (Issue #290)

@cmcinroyhttps://urldefense.com/v3/__https:/github.com/cmcinroy__;!!MvWE!S7L61ks7mCrSN39GnsGWcXK3Cfkx3BYefFCw3AdFan4P-7EnzYg99bH_Cnn2trXHrg$ - thanks for your insight. Rhino does have a sensitivity threshold that can be set using the sensitivity parameter in any of our SDKS. A higher sensitivity value would allow you to capture more enunciation variation, but it would come at the cost of potentially decreased inference accuracy.

As for the confidence metric, we are monitoring the interest in this feature via this issue and may include it in a future release if there is enough interest.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/Picovoice/rhino/issues/290*issuecomment-1049242607__;Iw!!MvWE!S7L61ks7mCrSN39GnsGWcXK3Cfkx3BYefFCw3AdFan4P-7EnzYg99bH_CnkKeGIHMA$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AQJCW3O4HTYRULEY6JHSF3DU4VIGDANCNFSM5MK6NP2A__;!!MvWE!S7L61ks7mCrSN39GnsGWcXK3Cfkx3BYefFCw3AdFan4P-7EnzYg99bH_CnlYtDmcIw$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!MvWE!S7L61ks7mCrSN39GnsGWcXK3Cfkx3BYefFCw3AdFan4P-7EnzYg99bH_Cnlm96XDMw$ or Androidhttps://urldefense.com/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!MvWE!S7L61ks7mCrSN39GnsGWcXK3Cfkx3BYefFCw3AdFan4P-7EnzYg99bH_CnlROpcCgQ$. You are receiving this because you were mentioned.Message ID: @.***>

cmcinroy commented 2 years ago

Rhino does have a sensitivity threshold that can be set using the sensitivity parameter in any of our SDKS. A higher sensitivity value would allow you to capture more enunciation variation, but it would come at the cost of potentially decreased inference accuracy.

Ah, yes. I recall seeing that... for some reason I had thought that related to hardware/microphone sensitivity. Thanks @laves for pointing that out, I will experiment with that.

As for the confidence metric, we are monitoring the interest in this feature via this issue and may include it in a future release if there is enough interest.

Yes, appreciate that. Thank you!

kenarsa commented 2 years ago

I don't think we can include this in our immediate roadmap in the coming months, unfortunately. I am closing this right now but will re-open if priorities change on our end.