Speech-Rule-Engine / speech-rule-engine

Generating speech descriptions for XML structures
https://zorkow.github.io/speech-rule-engine/
Apache License 2.0
75 stars 39 forks source link

Accents/diacritics in Clearspeak #632

Open klobetime opened 2 years ago

klobetime commented 2 years ago

Stumbled across this when working with word problems:

<math>
  <mfrac>
    <mtext>Number of lattés sold</mtext>
    <mtext>Number of cafés</mtext>
  </mfrac>
</math>

With Clearspeak and SRE v3.2 this results in "Number of latt e acute s sold over Number of caf e acute s" -- note the accented letters break the word and explicitly describe the modified letter. Mathspeak also splits the speech: "StartFraction Number of latt modifying above e with acute s sold Over Number of caf modifying above e with acute s EndFraction"

I tried using <mi> and <mi><mtext> in place of <mtext> above which change the results a bit but still expand out the accented letters.

I'm certainly open to other approaches for handling word problems in MathML (this example is a simplification for illustration), but suspect that diacritics will interrupt the resulting speech across the board.

zorkow commented 2 years ago

Sorry, this one fell through the cracks. There was little time for SRE at the beginning of the year. The issue is due to SRE parsing the text in order to find and translate math symbols. This is done using a regexp for the particular locale. English does not contain characters with diacritics in that regexp (unlike French for example), hence SRE assumes that these letters are accented math characters and translates them accordingly.

Making a general change to the way text elements are treated would be difficult in that it would badly affect content that do not explicitly mark up single math characters in mtext elements as quite a number of publishers do.

If you have control over the content I could suggest using one of the attributes, like
'aria-label', 'exact-speech', 'alt' around the "latte" element.