Open dmylzenova opened 1 month ago
%
not normalizing in Italian and French. This fix will be available shortly and will also cause the numbers in these languages to normalize correctly.h
and some other units not normalizing in French and are working to address that.This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Hello,
I have observed an issue where digits remain unnormalized in the output text when using the Nemo text normalization library, specifically with European languages such as German (de), Italian (it), and French (fr). This behavior occurs even though the expected output should not contain any digits.
Here is an example:
Expected output: No digits in the normalized text. Actual output: 'il 48% ha risposto che avrebbe dovuto provenire dal proprio budget.'
Additional Examples:
Other examples with similart behavior in format (text, normalized_text):
Expected Behavior: The normalized text should not contain any digits.
Actual Behavior: Digits are retained in the normalized output, which contradicts the expected behavior of a text normalization tool. This issue does not occur consistently but appears sometimes which is particularly problematic for tasks that require clean, digit-free text—such as grapheme-to-phoneme (g2p) conversion.
Environment:
Nemo version: I use nemo_text_processing with version==0.3.0rc0. Python version: Python 3.11.8