Decade Pluralization Doesn't Work For Years Pre-1000

xenotropic commented 1 year ago

Describe the bug

Nemo text processing 0.1.7rc0 will pluralize e.g., 1980s as "nineteen eighties" (correct) but 830s becomes "Eight Thirty S" (incorrect).

Steps/Code to reproduce bug

from nemo_text_processing.text_normalization.normalize import Normalizer
text = "In the 1980s personal computers became more widely available. In the 830s the Abbasid Caliphate started military excursions culminating with a victory in the Sack of Amorium."
normalizer = Normalizer(input_case='cased', lang='en' )
normalized_text = normalizer.normalize (text, verbose=False, punct_post_process=True)
print ( normalized_text )

Expected output

In the nineteen eighties personal computers became more widely available. In the eight hundred and thirties the Abbasid Caliphate started military excursions culminating with a victory in the Sack of Amorium.

Actual output

In the nineteen eighties personal computers became more widely available. In the eight hundred and thirty S the Abbasid Caliphate started military excursions culminating with a victory in the Sack of Amorium.

Environment overview (please complete the following information)

Environment location: metal
Method of NeMo install: pip

Environment details

If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:

OS version Ubuntu 22.04.2 LTS
PyTorch version 1.13.1+cu117
Python version 3.9

xenotropic commented 1 year ago

Also near-future decades; 2060s is rendered as "two thousand and sixty S".

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

NVIDIA / NeMo-text-processing

Decade Pluralization Doesn't Work For Years Pre-1000 #72