OpenVoiceOS / ovos-tts-plugin-mimic3-server

Apache License 2.0
2 stars 4 forks source link

"Say-as interpret-as" #8

Open ChanceNCounter opened 1 year ago

ChanceNCounter commented 1 year ago

At least one of these is being processed literally by TTS, rather than parsed:

https://github.com/OpenVoiceOS/ovos-tts-plugin-mimic3-server/blob/009def97295ff2971d78231b582bb9ef8f45b3f0/ovos_tts_plugin_mimic3_server/__init__.py#L170-L175

Hence, when it reads a pairing code, you get

say as interpret as spell out Q say as...

This is both extremely annoying and entirely unhelpful, but fortunately it's also a bug =P

ChanceNCounter commented 1 year ago

Assigned to Jarbas because I don't know what that's supposed to parse to, and git blames Jarbas

JarbasAl commented 1 year ago

taken directly from upstream, no idea what the reasoning behind that is

ChanceNCounter commented 1 year ago

@MaxBachmann Sorry to bother you, but it looks like this is being passed to gruut and I can’t spot a problem with the input.

Am I missing something, or is this internal to Mimic, rather than a bug in the above snippet or in gruut?

maxbachmann commented 1 year ago

It has been a while since I looked into these projects. Gruut appears to support this SSML:

>>> next(gruut.sentences('<say-as interpret-as="spell-out">A</say-as>', ssml=True, lang="en"))
Sentence(idx=0, text='A', text_with_ws='A', text_spoken='A', par_idx=0, lang='en', voice='', words=[Word(idx=0, text='A', text_with_ws='A', leading_ws='', trailing_ws='', sent_idx=0, par_idx=0, lang='en', voice='', pos='DT', phonemes=['ˈeɪ'], is_major_break=False, is_minor_break=False, is_punctuation=False, is_break=False, is_spoken=True, pause_before_ms=0, pause_after_ms=0, marks_before=None, marks_after=None)], pause_before_ms=0, pause_after_ms=0, marks_before=None, marks_after=None)

@synesthesiam is probably the right person to ask on this topic

goldyfruit commented 1 year ago

I have the same issue as well with my SONOS skill, no issue when switching to Polly.

https://github.com/smartgic/skill-sonos-controller/blob/main/utils.py#L110

synesthesiam commented 1 year ago

Gruut has different pronunciations for some individual letters, such as "A" in "A light turned on" vs. "A B C are letters" (pronounced "uh" vs "ey"). It doesn't try to guess the context, so this needs to be provided externally via SSML.

ChanceNCounter commented 1 year ago

🙏That was the hint I needed. I’m on a phone, but @JarbasAl @goldyfruit:

https://github.com/OpenVoiceOS/ovos-tts-plugin-mimic3-server/blob/f21094ba5d068e023c29699cd156f7390065dbaf/ovos_tts_plugin_mimic3_server/__init__.py#L116-L123

ssml is a bool, and presumably should be a param sent to Mimic, but is discarded.