NVIDIA / NeMo-text-processing

NeMo text processing for ASR and TTS
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/text_normalization/wfst/wfst_text_normalization.html
Apache License 2.0
258 stars 84 forks source link

Tn en astronomical no #28

Closed anand-nv closed 1 year ago

anand-nv commented 1 year ago

What does this PR do ?

Adds support for normalization of numbers larger than a trillion

Before your PR is "Ready for review"

Pre checks:

PR Type:

jimregan commented 1 year ago

124,444,234,854,823,834,553~one hundred twenty four quintillion four hundred forty four quadrillion two hundred thirty four trillion eight hundred fifty four billion eight hundred twenty three million eight hundred thirty four thousand five hundred and fifty three

Um... As a speaker of non-US English, I find this a strange mix of US/non-US number patterns. "one hundred and twenty four quintillion four hundred and forty four quadrillion two hundred and thirty four trillion eight hundred and fifty four billion eight hundred and twenty three million eight hundred and thirty four thousand five hundred and fifty three" is how it would typically be read on this side of the Atlantic.

yzhang123 commented 1 year ago

@anand-nv could you take a look ^

anand-nv commented 1 year ago

I'm not sure I'm the best person to make a decision here considering that South Asia follows it's own number system. However it does follow the British usage of using "and" when referring to the global cardinals. Since the existing grammar already uses "and" for the last term and there seems to be variations in the US itself on how "and" is used (https://english.stackexchange.com/questions/3518/american-vs-british-english-meaning-of-one-hundred-and-fifty), I'll leave it to someone else to standardize on the usage. The change is trivial to implement (don't apply add_optional_and on the graph or extend add_optional_and to all cases)