brailcom / speechd

Common high-level interface to speech synthesis
GNU General Public License v2.0
213 stars 63 forks source link

The abbreviation "cf." is not properly read in French #829

Open FiableDotBiz opened 1 year ago

FiableDotBiz commented 1 year ago

Steps to reproduce

spd-say -l fr cf.

Obtained behavior

The abbreviation "cf." is read letter by letter "c. f." ([seɛf] in the international phonetic alphabet).

Expected behavior

The abbreviation "cf." should be read letter "confer" ([kõˈfɛʁ] in the international phonetic alphabet). This is because this abbreviation is very used, well-known (it appears even in the dictionary of the French academy ( https://www.dictionnaire-academie.fr/article/A9C3488-A ) and unambiguous.

Behavior information

Please follow the next step, to provide us with precious information about how things went wrong on your machine:

Distribution

updated Fedora workstation 38

speech-dispatcher.log espeak-ng.log espeak-ng-mbrola.log

Version of Speech-dispatcher

0.11.4

sthibaul commented 1 year ago

Mmmm, in this case we can probably introduce a generic rule that all synthesis will benefit from, indeed.

FiableDotBiz commented 1 year ago

I don't know if there is enough artificial intelligence in speech-dispatcher or in Orca, but the reading of other abbreviations could be contextualized. For instance if it's detected that the text is a Christian one, the Bible book abbreviations should be read fully: "Gn" → "Genèse", "Ex" → "Exode" etc., as a human reader would do.

sthibaul commented 1 year ago

There is no AI in the speech stack :) I actually don't think we should put any, as in: we'd rather have a deterministic behavior.