MontrealCorpusTools / mfa-models

Collection of pretrained models for the Montreal Forced Aligner
Creative Commons Attribution 4.0 International
103 stars 19 forks source link

G2P mandarin_pinyin_g2p.zip ignore repeated tokens #25

Open liubc-ai opened 7 months ago

liubc-ai commented 7 months ago

Hi,I found that using mandarin_pinyin_g2p.zip to extract pinyin phonemes ignored repeated tokens, how can I avoid it?

Example: shi4 yi1 jia1 zhi4 yao4 gong1 si1 de5 duan3 qi1 gong1

Expected results: sh ii4 i1 j ia1 zh ii4 iao4 g o1 ng s ii1 d e5 d ua3 n q i1 g o1 ng

But I got the results: sh ii4 i1 j ia1 zh ii4 iao4 g o1 ng s ii1 d e5 d ua3 n q i1

Looking forward to your reply