NVIDIA / NeMo-text-processing

NeMo text processing for ASR and TTS
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/text_normalization/wfst/wfst_text_normalization.html
Apache License 2.0
242 stars 77 forks source link

zh text normalizer cannot handle " #103

Closed npuichigo closed 6 months ago

npuichigo commented 10 months ago
In [12]: written = "你好\""

In [13]: normalizer.normalize(written, verbose=True)
tokens { name: "你好"" }
ERROR: StringFstToOutputLabels: Invalid start state
---------------------------------------------------------------------------
FstOpError                                Traceback (most recent call last)
Cell In[13], line 1
----> 1 normalizer.normalize(written, verbose=True)

File ~/LocalCodes/NeMo-text-processing/nemo_text_processing/text_normalization/normalize.py:354, in Normalizer.normalize(self, text, verbose, punct_pre_process, punct_post_process)
    352     if verbalizer_lattice is None:
    353         raise ValueError(f"No permutations were generated from tokens {s}")
--> 354     output += ' ' + Normalizer.select_verbalizer(verbalizer_lattice)
    355 output = SPACE_DUP.sub(' ', output[1:])
    357 if self.lang == "en" and hasattr(self, 'post_processor'):

File ~/LocalCodes/NeMo-text-processing/nemo_text_processing/text_normalization/normalize.py:642, in Normalizer.select_verbalizer(lattice)
    632 @staticmethod
    633 def select_verbalizer(lattice: 'pynini.FstLike') -> str:
    634     """
    635     Given verbalized lattice return shortest path
    636
   (...)
    640     Returns: shortest path
    641     """
--> 642     output = pynini.shortestpath(lattice, nshortest=1, unique=True).string()
    643     # lattice = output @ self.verbalizer.punct_graph
    644     # output = pynini.shortestpath(lattice, nshortest=1, unique=True).string()
    645     return output

File extensions/_pynini.pyx:462, in _pynini.Fst.string()

File extensions/_pynini.pyx:507, in _pynini.Fst.string()

FstOpError: Operation failed
github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 8 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

npuichigo commented 8 months ago

not resolved

ekmb commented 8 months ago

@BuyuanCui could you please take a look at this?

BuyuanCui commented 8 months ago

Hello, I tried running zh tn on my local machine and got as below. If this looks correct, when I push my changes to the zh tn grammar, it should be solved.

alcui@NV-640H9C3:~/NEMO_1004_ZHTNBUG/NeMo-text-processing$ python nemo_text_processing/text_normalization/normalize.py --lang='zh' --text='你好' --verbose tokens { name: "你" } tokens { name: "好" } 你好 Execution time: 0.04 sec

npuichigo commented 8 months ago

Please also try --text='你好"'? The question is related to quotes "

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.