NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.95k stars 2.48k forks source link

[text processing] Chinese text normalization PR #4543

Closed dophist closed 2 years ago

dophist commented 2 years ago

This is a preliminary inquiry if NeMo team is open to accept PR for Chinese text normalization, or you guys have internal development plans for it ?

SpeechColab is assigning 1 intern student to work on Chinese text normalization(based on Pynini), we'd like to contribute this work to NeMo, hopefully in the next couple of weeks. Basically what we will have is a set of WFST rewrite rules just like existing NeMo en TN, reproducing TN behaviors in this project, which handles many typical TN scenarios for Chinese.

Known Issue The intern student will no longer be maintaining the grammars after the internship, but the resulting PR is guaranteed to establish a foundation, enabling Chinese TN in NeMo.

Would like to hear your thoughts on this. @yzhang123 @ekmb

ekmb commented 2 years ago

We would appreciate the proposed PR! Looking forward to the contribution!

dophist commented 2 years ago

Cool. I'll close this issue when the PR is ready.

dophist commented 2 years ago

The resulting PR has been created in https://github.com/NVIDIA/NeMo/pull/4638 , I'm closing this issue. Reviews and discussions can go there.