WICG / speech-api

Web Speech API
https://wicg.github.io/speech-api/
145 stars 30 forks source link

Add profanity/offensive words filter attribute #72

Open Ninajoy opened 4 years ago

Ninajoy commented 4 years ago

No idea if i am on the right track as to why curse words appear differently in the transcript of SpeechRecognitionResult in different browsers. Therefore thought it best to open an issue here.

Question If browsers implement the transcript SpeechRecognitionResult in such a way where the output differs maybe a profanity filter attribute could be useful so that the developer using the API has a choice in that matter? For example offensiveWordFilter attribute, of type boolean?

Background Story While experimenting with the SpeechRecognition Interface in the phrase-matcher from https://github.com/mdn/web-speech-api/ the following occurred:

  1. When using Chrome and saying a curse word like shit, the transcript in SpeechRecognitionResult is censored as s****
  2. When using Firefox Nightly and saying a curse word like shit the transcript in SpeechRecognitionResult is not censored

In neither Chrome nor Nightly this type of censoring is applied for the speechSynthesis interface as used in the speak-easy-synthesis.

In my search into why this happens i found the following: On https://github.com/chromium/chromium/blob/master/content/browser/speech/speech_recognition_engine.cc on line 277 filter_profanities is set to false on line 579 it should result in pFilter=0. According to https://stackoverflow.com/questions/15030339/remove-profanity-censor-from-google-speech-recognition/15071054 the setting pfilter=0 results in removing the profanity filter. Which could lead to the conclusion in chrome this is changed. I do not feel confident in this conclusion however.

In Nightly I have found no reference in the code to a profanity filter https://dxr.mozilla.org/mozilla-central/source/dom/media/webspeech/recognition

marcoscaceres commented 4 years ago

That seems like a bug in Chrome (or Google's speech service). The recognition engine should be profanity agnostic. The consuming application should then do its own filtering.

Ninajoy commented 4 years ago

Thank you for your answer.

I can see a bug was registered for this in chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=804812&q=speech%20censored&colspec=ID%20Pri%20M%20Stars%20ReleaseBlock%20Component%20Status%20Owner%20Summary%20OS%20Modified

In the HTML Speech Incubator document in chapter 7.1.2.3 Builtin Default Grammars on https://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/ the following was included: It is recommended that speech services support a filter parameter that can be set to the value noOffensiveWords to represent a desire to not recognize offensive words.

Would it therefore be handy, to prevent further misunderstandings about this subject, to change my request to include in the speech-api documentation that the engine should be profanity agnostic?

evanbliu commented 4 days ago

FYI, Chrome just updated its implementation of the Web Speech API to remove profanity masking. This change will take effect in release M127.