NVIDIA / NeMo-text-processing

NeMo text processing for ASR and TTS
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/text_normalization/wfst/wfst_text_normalization.html
Apache License 2.0
246 stars 80 forks source link

Decade Pluralization Doesn't Work For Years Pre-1000 #72

Closed xenotropic closed 1 year ago

xenotropic commented 1 year ago

Describe the bug

Nemo text processing 0.1.7rc0 will pluralize e.g., 1980s as "nineteen eighties" (correct) but 830s becomes "Eight Thirty S" (incorrect).

Steps/Code to reproduce bug

from nemo_text_processing.text_normalization.normalize import Normalizer
text = "In the 1980s personal computers became more widely available. In the 830s the Abbasid Caliphate started military excursions culminating with a victory in the Sack of Amorium."
normalizer = Normalizer(input_case='cased', lang='en' )
normalized_text = normalizer.normalize (text, verbose=False, punct_post_process=True)
print ( normalized_text )

Expected output

In the nineteen eighties personal computers became more widely available. In the eight hundred and thirties the Abbasid Caliphate started military excursions culminating with a victory in the Sack of Amorium.

Actual output

In the nineteen eighties personal computers became more widely available. In the eight hundred and thirty S the Abbasid Caliphate started military excursions culminating with a victory in the Sack of Amorium.

Environment overview (please complete the following information)

Environment details

If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:

xenotropic commented 1 year ago

Also near-future decades; 2060s is rendered as "two thousand and sixty S".

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale.