Is the rule 15 correctly implemented?

Dear contributors, Thank you for your great works!

I have been trying to improve TTS quality while keeping the amount of the data unchanged.

I thought using the g2pk package would improve the model by reducing the number of the token being fed into the model by a significant amount, rule 8 for example, reducing 21(Jongsung) tokens to 7(Pronounceable Jongsung) tokens.

I combined GlowTTS, g2pk, and Multi-band MelGan and trained with the KSS dataset and acquired the following result.

G2PK Comparison Demo

It seems that g2pk grapheme tokens are much better than just using Jamo tokens!

Yet, I found that the g2pk conversion result is slightly different from how common Korean usually pronounce.

Since I am no expert in the Korean language, I referred to 한국어 어문 규범 and 부산대학교 표준발음 변환기.

For a sample sentence from the KSS,

Source	Result
Original Sentence	저는 귀가 어두운데 다른 사람의 얘기를 아주 잘 들어 준다는 말을 많이 들어왔어요.
G2PK	저는 귀가 어두운데 다른 사라믜 얘기르 라주 잘 드러 준다는 마를 마니 드러와써요.
부산대학교	저는 귀가 어두운데 다른 사라메 얘기를 아주 잘 드러 준다는 마를 마니 드러와써요

I suggest that "르 라" is the problem since common Korean does not speak like that.

I found the following function in the source, regular.py.


def link3(inp, descriptive=False, verbose=False):
    rule = rule_id2text["15"]
    out = inp

    pairs = [ ("ᆨ ᄋ", " ᄀ"),
                  ...
              ("ᆹ ᄋ", "ᆸ ᄊ") ]

    for str1, str2 in pairs:
        out = out.replace(str1, str2)

    gloss(verbose, out, inp, rule)
    return out

From 한국어 어문 규범, 제15항 받침 뒤에 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’ 들로 시작되는 실질 형태소가 연결되는 경우에는, 대표음으로 바꾸어서 뒤 음절 첫소리로 옮겨 발음한다.

And it seems that you do not consider 실질 형태소 or 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’.

Is consideration being taken in other parts of the source?

If not, I would like to implement it by myself. Please let me know if you have already improved this part.

**The real question is that the g2pk conversion result above is the correct answer according to 한국어 어문 규범!**

"아주" starts with "ㅏ" and it is a 실질 형태소 and 대표음 of "ㄹ" from "를" is "ㄹ".

So, "를 아주" should be pronounced "르 라주" according to 한국어 어문 규범.

I have been thinking of this issue for several weeks, and I have concluded that Korean tends to attach a comma to space " " between letters when they think it is needed. "애기를 아주" becomes "애기를, 아주" to highlight the pronunciation of "아" as "아", to distinguish it from "라". Yet, I have not found any good algorithm to selectively apply rule 15 in accordance with my common sense.

As a quick fix, I just nullified the link3 and named it G2PK no 15 on the demo page.

I have already achieved a satisfactory experimental result, and it seems OK to extend my research on Phoneme and Grapheme alignment.

But, as I mentioned earlier, I am not a professional in Korean or any other Linguistics.

So, I would appreciate an opinion from the real linguist to properly improve my TTS results and G2P conversion.

So if you have any opinion regarding my questions, please share it.

Thanks.

Kyubyong / g2pK

Is the rule 15 correctly implemented? #6

**The real question is that the g2pk conversion result above is the correct answer according to 한국어 어문 규범!**