NVIDIA / NeMo-text-processing

NeMo text processing for ASR and TTS
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/text_normalization/wfst/wfst_text_normalization.html
Apache License 2.0
242 stars 77 forks source link

unexpected normalized text for Arabic #111

Closed cnlinxi closed 8 months ago

cnlinxi commented 9 months ago

Describe the bug

Unable to convert some currency correctly for Arabic

Steps/Code to reproduce bug


normalizer = Normalizer(input_case='cased',
                        lang=lang,
                        cache_dir=normalize_cache_dir,
                        overwrite_cache=False)

texts = 'aed1.2'
normalized_text = normalizer.normalize(text=text, verbose=True)
print(normalized_text)

Then:

normalizer/escape: aed1.2
normalizer/select_tag: tokens { money { integer_part: "واحد" currency_maj: "درهم إماراتي" fractional_part: "عشرون" preserve_order: true } }
ERROR: StringFstToOutputLabels: Invalid start state
Traceback (most recent call last):
  File "run_normalize_file.py", line 24, in <module>
    normalized_text = normalizer.normalize(text=text, verbose=True)
  File "~/text_normalization/normalize.py", line 320, in normalize
    output += ' ' + Normalizer.select_verbalizer(verbalizer_lattice)
  File "~/text_normalization/normalize.py", line 479, in select_verbalizer
    output = pynini.shortestpath(lattice, nshortest=1, unique=True).string()
  File "extensions/_pynini.pyx", line 462, in _pynini.Fst.string
  File "extensions/_pynini.pyx", line 507, in _pynini.Fst.string
_pywrapfst.FstOpError: Operation failed

Expected behavior

No error.

Environment overview (please complete the following information)

ekmb commented 8 months ago

@anand-nv , @mgrafu could you please take a look?