Closed rupertthurner closed 1 year ago
Mmmm, I don't think we can actually support that: the usual knob that we can tune in speech syntheses is only the language, and not details in the processing... Which means we can only support LANG, and not LC_* values that'd be different from that. What you can do, however, is use ssml tagging within the text, to change the language on the fly.
@sthibaul speechd is not only wrong for LC_NUMERIC, but also for LANG, set:
❯ set LANG de_CH.UTF-8
❯ spd-say "1'130,01"
wrong: eins einhundertdreissig komma null eins , wrong, should be eintausend einhundertdreissig komma null eins
but, LANG is anyway not something which can be used here. take this example. english is a global language, so many sites use it, but use of course the local formatting, and write: "Zürich is the largest city in Switzerland with a population of over 428'700, an increase of 19'500 since year 2000. 1,4 million people live in Zürich agglomeration. " (from facts and figures
if we switch over the language to german, then it would speak: "population of over vierhundertachtundzwanzig siebenhundert an increase .." which is the same error as in english, but in german. same is valid for swiss french.
so, if you do not want to take into account LC_NUMERIC for number formatting, what about considering at least undisputed ones, independent of language and formatting.
this then should fix many edge cases already. the real challenge is the inversion of comma and dot depending on the region, i can lively imagine. creates headache.
many sites use it, but use of course the local formatting
That doesn't seem very common to me.
if you do not want to take into account LC_NUMERIC for number formatting
The problem is not that I don't want to take it into account, but that speech synthesizers don't provide the interface to do so.
considering at least undisputed ones
It's the synthesizers which implement this, so that is where this suggestion should be reported.
It's the synthesizers which implement this, so that is where this suggestion should be reported.
oh, really? is this what i have installed synthesizers? what would speechd then do, it has no influence on the text, like it can (should) not remove digit groupings?
❯ paru -Q | grep espe
espeak-ng 1.51.1-2
espeakup 0.90-2
is this what i have installed synthesizers
Yes, espeak-ng is a synthesizer.
it has no influence on the text, like it can (should) not remove digit groupings?
It'd be very fragile to do something about digit grouping, since synthesizers may have different behaviors in different languages. I would be terribly cautious with trying to mangle the content.
ok, created: https://github.com/espeak-ng/espeak-ng/issues/1812 .
if one tries the following, all but the first two are spoken incorrectly:
Obtained behavior
speechd knows dot as comma, and comma as thousands separator.
Expected behavior
speechd should have three modes:
the options are out of digit grouping