Closed vikcost closed 3 months ago
This is intended behaviour to keep numbers <10 in their spoken form, you can comment out https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/inverse_text_normalization/ru/taggers/cardinal.py#L44 to avoid this.
Thanks, I figured this too. I wonder what's the intended use case for such a behavior?
One might expect that (inverse)normalization has a universal behavior across the languages.
nevertheless, even with a suggested change, inverse normalization of ordinals is error-prone.
третий год -> третий год # expected '3-й год'
тридцать третий час -> 33 час # expected '33-й час'
nevertheless, even with a suggested change, inverse normalization of ordinals is error-prone.
третий год -> третий год # expected '3-й год' тридцать третий час -> 33 час # expected '33-й час'
https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/inverse_text_normalization/ru/taggers/ordinal.py#L42 should be commented out for ordinals too.
Thanks, I figured this too. I wonder what's the intended use case for such a behavior?
One might expect that (inverse)normalization has a universal behavior across the languages.
The motivation is to avoid normalization for cases like one of us
.
Collaborator
There is such a problem in English. What to do to resolve it in English?
@ekmb thanks for the example.
Intuitively it should be possible to build a graph that accounts for cases as "one of us" and returns identity without any inverse-normalization.
On the other hand, it's is totally fine to expect the following "one of us" -> "1 of us".
Example: