自定義文字讀音 - Githubissues

uone-1 commented 5 months ago

只需改寫 api.py 的 clean_text_inf 函數，並添加 find_custom_tone 函數，即可實現。用法: 在需修改讀音的文字後方加上 {讀音}

Input: "我今天吃的很{2}飽。" "很" 的讀音就從原先的3聲轉為2聲了

新增 find_custom_tone 函數

def find_custom_tone(text):
    text = text.replace(" ", "") # 去除空格
    tone_list = []
    matches = list(re.finditer(r"{(.*?)}", text))
    offset = 0
    for match in matches:
        pos = match.start() - offset
        content = match.group(1)
        offset += (2 + len(content))
        data = [content, pos]
        tone_list.append(data)
    return re.sub(r"{.*?}", "", text), tone_list

修改 clean_text_inf 函數

def clean_text_inf(text:str, language):
    text, tone_data_list = find_custom_tone(text)
    phones, word2ph, norm_text = clean_text(text, language)
    print(phones)
    for tone_data in tone_data_list:
        tone = tone_data[0]
        pos = tone_data[1]
        wd_pos = 0
        for i in range(0, pos):
            wd_pos += word2ph[i]
        wd_pos -= 1 # 因為遞加的值是數量 所以需要-1
        org_phones = phones[wd_pos]
        phones[wd_pos] = str(phones[wd_pos])[:-1] + tone
        print(f"[+]成功修改讀音: {org_phones} => {phones[wd_pos]}")
    phones = cleaned_text_to_sequence(phones)
    return phones, word2ph, norm_text

uone-1 commented 5 months ago

同一个多音字的发音，很多时候并非一样。示例汉字：《角》

您的用法："这个字的读音是角{2}色，而不是角{3}色"
建议用法：

 "这个字的读音是角(jue2)色，而不是角(jiao3)色"  
备注：它有 jue 和 jiao 的音素读法，还有很多类似这样的多音汉字。结论：仅增加数字，应该不能满足需求，所以应该（拼音+数字）。

感謝您提供的建議，但最初只是想著讓生成的語句更加口語化而已，所以並沒有考慮到多音漢字這塊。

不過若有這需求的人，只需對 clean_text_inf 稍做更改即可。

HsiangLeekwok commented 4 months ago

感谢 @uone-1 ，在你的思路下，我做了一个改进版本的多音字的处理方式，符合你的要求 @juntaosun 。同样需要改写 api.py 或者 inference_webui.py 或者其他需要处理文字的地方，改写 clean_text_inf 方法，新增 find_custom_tone1 方法(为了区分 @uone-1 的 find_custom_tone 方法)、get_initial_final 方法和 revise_custom_tone 方法。

使用方式：用自定义的 <tone/> 标签把单个字包起来，并用 as 属性标记出正确的读音，读音支持 pypinyin 的 TONE 和 TONE3 风格，自动识别。

具体用法如下(例子试听 demo 在最后)：

他说他数<tone as=shu3>数</tone>不好，所以我就<tone as='jiāo'>教</tone>他怎么<tone as="shu4">数</tone>数。

为什么改用 tone 标签而不是继续用 {} 括号，是因为括号在我们的项目中有其他用途 😄

添加 import

from pypinyin import pinyin, Style

改写 clean_text_inf 方法

def clean_text_inf(text, language):
    # 查找自定义多音字
    text, tone_data_list = find_custom_tone1(text)
    phones, word2ph, norm_text = clean_text(text, language)
    # 有自定义多音字列表的，需要修正多音字
    # @see https://github.com/RVC-Boss/GPT-SoVITS/issues/1175
    if len(tone_data_list) > 0:
        revise_custom_tone(phones, word2ph, tone_data_list)
    phones = cleaned_text_to_sequence(phones)
    return phones, word2ph, norm_text

新增方法 find_custom_tone1，get_initial_final 和 revise_custom_tone


def find_custom_tone1(text: str):
    """
    识别、提取文本中的多音字
    """
    tone_list = []
    txts = []
    # 识别 tone 标记，形如<tone as=shu4>数</tone>或<tone as=\"shu3\">数</tone>或<tone as=\"shù\">数</tone>
    ptn1 = re.compile(r"<tone.*?>(.*?)</tone>")
    # 清除 tone 标记中不需要的部分
    ptn2 = re.compile(r"(</?tone)|(as)|([>\"'\s=])")
    matches = list(re.finditer(ptn1, text))
    offset = 0
    for match in matches:
        # tone 标记之前的文本
        pre = text[offset:match.start()]
        txts.append(pre)
        # tone 标签中的单个多音字
        tone_text = match.group(1)
        txts.append(tone_text)
        # 提取读音，支持识别 Style.TONE 和  Style.TONE3
        tone = match.group(0)
        tone = re.sub(ptn2, "", tone)
        tone = tone.replace(tone_text, "")
        # 多音字在当前文本中的索引位置
        pos = sum([len(s) for s in txts])
        offset = match.end()
        init, final = get_initial_final(tone_text, tone)
        data = [tone, init, final, pos]
        tone_list.append(data)
    # 不能忘了最后一个 tone 标签后面可能还有剩余的内容
    if offset < len(text):
        txts.append(text[offset:])

    text = ''.join(str(i) for i in txts)
    text = text.replace(" ", "")  # 去除空格
    return text, tone_list

def get_initial_final(wd, py):
    """
    根据自定义的多音字读音匹配正确的声母、韵母。这里参考的 text/chinese.py 中的 _get_initials_finals 方法。
    """
    # 声母列表
    initials = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.INITIALS)[0]
    # 韵母列表
    finals_tone1 = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.FINALS_TONE)[0]
    finals_tone3 = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.FINALS_TONE3)[0]
    # 因为不知道用户端究竟传的是 TONE1 还是 TONE3 风格的，所以这里需要组合不同风格的声母、韵母去做对比
    for init in initials:
        # TONE1 风格
        for final in finals_tone1:
            compose = init + final
            if compose == py:
                return init, finals_tone3[finals_tone1.index(final)]
        # TONE3 风格
        for final in finals_tone3:
            compose = init + final
            if compose == py:
                return init, final
    # 无法匹配时，返回空以保持模型中默认读音
    return "", ""

def revise_custom_tone(phones, word2ph, tone_data_list):
    """
    修正自定义多音字
    """
    for td in tone_data_list:
        tone = td[0]
        init = td[1]
        final = td[2]
        pos = td[3]
        if init == "" and final == "":
            # 如果匹配拼音的时候失败，这里保持模型中默认提供的读音
            continue

        wd_pos = 0
        for i in range(0, pos):
            wd_pos += word2ph[i]
        org_init = phones[wd_pos - 2]
        org_final = phones[wd_pos - 1]
        phones[wd_pos - 2] = init
        phones[wd_pos - 1] = final
        print(f"[+]成功修改读音: {org_init}{org_final} => {tone}")

例子：

这个字的读音是<tone as="jué">角</tone>色，而不是<tone as="jiao3">角</tone>色

wav demo

douxuebuhui commented 1 month ago

感谢 @uone-1 ，在你的思路下，我做了一个改进版本的多音字的处理方式，符合你的要求 @juntaosun 。同样需要改写 api.py 或者 inference_webui.py 或者其他需要处理文字的地方，改写 clean_text_inf 方法，新增 find_custom_tone1 方法(为了区分 @uone-1 的 find_custom_tone 方法)、get_initial_final 方法和 revise_custom_tone 方法。

使用方式：用自定义的 <tone/> 标签把单个字包起来，并用 as 属性标记出正确的读音，读音支持 pypinyin 的 TONE 和 TONE3 风格，自动识别。

具体用法如下(例子试听 demo 在最后)：

他说他数<tone as=shu3>数</tone>不好，所以我就<tone as='jiāo'>教</tone>他怎么<tone as="shu4">数</tone>数。

为什么改用 tone 标签而不是继续用 {} 括号，是因为括号在我们的项目中有其他用途 😄

添加 import

from pypinyin import pinyin, Style

改写 clean_text_inf 方法

def clean_text_inf(text, language):
    # 查找自定义多音字
    text, tone_data_list = find_custom_tone1(text)
    phones, word2ph, norm_text = clean_text(text, language)
    # 有自定义多音字列表的，需要修正多音字
    # @see https://github.com/RVC-Boss/GPT-SoVITS/issues/1175
    if len(tone_data_list) > 0:
        revise_custom_tone(phones, word2ph, tone_data_list)
    phones = cleaned_text_to_sequence(phones)
    return phones, word2ph, norm_text

新增方法 find_custom_tone1，get_initial_final 和 revise_custom_tone

def find_custom_tone1(text: str):
    """
    识别、提取文本中的多音字
    """
    tone_list = []
    txts = []
    # 识别 tone 标记，形如<tone as=shu4>数</tone>或<tone as=\"shu3\">数</tone>或<tone as=\"shù\">数</tone>
    ptn1 = re.compile(r"<tone.*?>(.*?)</tone>")
    # 清除 tone 标记中不需要的部分
    ptn2 = re.compile(r"(</?tone)|(as)|([>\"'\s=])")
    matches = list(re.finditer(ptn1, text))
    offset = 0
    for match in matches:
        # tone 标记之前的文本
        pre = text[offset:match.start()]
        txts.append(pre)
        # tone 标签中的单个多音字
        tone_text = match.group(1)
        txts.append(tone_text)
        # 提取读音，支持识别 Style.TONE 和  Style.TONE3
        tone = match.group(0)
        tone = re.sub(ptn2, "", tone)
        tone = tone.replace(tone_text, "")
        # 多音字在当前文本中的索引位置
        pos = sum([len(s) for s in txts])
        offset = match.end()
        init, final = get_initial_final(tone_text, tone)
        data = [tone, init, final, pos]
        tone_list.append(data)
    # 不能忘了最后一个 tone 标签后面可能还有剩余的内容
    if offset < len(text):
        txts.append(text[offset:])

    text = ''.join(str(i) for i in txts)
    text = text.replace(" ", "")  # 去除空格
    return text, tone_list

def get_initial_final(wd, py):
    """
    根据自定义的多音字读音匹配正确的声母、韵母。这里参考的 text/chinese.py 中的 _get_initials_finals 方法。
    """
    # 声母列表
    initials = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.INITIALS)[0]
    # 韵母列表
    finals_tone1 = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.FINALS_TONE)[0]
    finals_tone3 = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.FINALS_TONE3)[0]
    # 因为不知道用户端究竟传的是 TONE1 还是 TONE3 风格的，所以这里需要组合不同风格的声母、韵母去做对比
    for init in initials:
        # TONE1 风格
        for final in finals_tone1:
            compose = init + final
            if compose == py:
                return init, finals_tone3[finals_tone1.index(final)]
        # TONE3 风格
        for final in finals_tone3:
            compose = init + final
            if compose == py:
                return init, final
    # 无法匹配时，返回空以保持模型中默认读音
    return "", ""

def revise_custom_tone(phones, word2ph, tone_data_list):
    """
    修正自定义多音字
    """
    for td in tone_data_list:
        tone = td[0]
        init = td[1]
        final = td[2]
        pos = td[3]
        if init == "" and final == "":
            # 如果匹配拼音的时候失败，这里保持模型中默认提供的读音
            continue

        wd_pos = 0
        for i in range(0, pos):
            wd_pos += word2ph[i]
        org_init = phones[wd_pos - 2]
        org_final = phones[wd_pos - 1]
        phones[wd_pos - 2] = init
        phones[wd_pos - 1] = final
        print(f"[+]成功修改读音: {org_init}{org_final} => {tone}")

例子：

这个字的读音是<tone as="jué">角</tone>色，而不是<tone as="jiao3">角</tone>色

wav demo

大佬，我用8.27版本网页ui，并行推理没效果，不能识别tone，能不能修复下？

douxuebuhui commented 1 month ago

@HsiangLeekwok 建议使用像这样的格式，可以避免使用 <tone as="jué">角</tone>, 更简化如下：
# 开启汉语拼音保留，（TTS标准：数字拼音格式），默认关闭。  
LangSegment.setKeepPinyin(True)  

# 汉字拼音指定示例：以下句子，括号中的拼音，均识别为中文。
text = "这个字的读音是角(jue2)色，而不是角(jiao3)色" 
langlist = LangSegment.getTexts(text)
# 包括音素整体保留为： zh， 
# 然后对结果进行正则匹配，修改音素
正则表达式，稍做修改只需要解析： 角(jue2) 或者 角(jiao3) 实现起来比 <tone as="jué">角</tone> 或者 <tone as="jiao3">角</tone> 更简单。

大佬，最新版用并行推理，能不能发下怎么改文件？这个太旧了，我试了不成功，我也不会改。。

viryaka commented 1 month ago

@HsiangLeekwok 建议使用像这样的格式，可以避免使用 <tone as="jué">角</tone>, 更简化如下：
# 开启汉语拼音保留，（TTS标准：数字拼音格式），默认关闭。  
LangSegment.setKeepPinyin(True)  

# 汉字拼音指定示例：以下句子，括号中的拼音，均识别为中文。
text = "这个字的读音是角(jue2)色，而不是角(jiao3)色" 
langlist = LangSegment.getTexts(text)
# 包括音素整体保留为： zh， 
# 然后对结果进行正则匹配，修改音素
正则表达式，稍做修改只需要解析： 角(jue2) 或者 角(jiao3) 实现起来比 <tone as="jué">角</tone> 或者 <tone as="jiao3">角</tone> 更简单。
大佬，最新版用并行推理，能不能发下怎么改文件？这个太旧了，我试了不成功，我也不会改。。

在TextPreprocessor.py修改clean_text_inf函数，然后在TextPreprocessor.py里添加 from pypinyin import pinyin, Style，把新增函数get_initial_final和revise_custom_tone贴到最后面

russell-shu commented 3 weeks ago

感谢 @uone-1 ，在你的思路下，我做了一个改进版本的多音字的处理方式，符合你的要求 @juntaosun 。同样需要改写 api.py 或者 inference_webui.py 或者其他需要处理文字的地方，改写 clean_text_inf 方法，新增 find_custom_tone1 方法(为了区分 @uone-1 的 find_custom_tone 方法)、get_initial_final 方法和 revise_custom_tone 方法。

使用方式：用自定义的 <tone/> 标签把单个字包起来，并用 as 属性标记出正确的读音，读音支持 pypinyin 的 TONE 和 TONE3 风格，自动识别。

具体用法如下(例子试听 demo 在最后)：

他说他数<tone as=shu3>数</tone>不好，所以我就<tone as='jiāo'>教</tone>他怎么<tone as="shu4">数</tone>数。

为什么改用 tone 标签而不是继续用 {} 括号，是因为括号在我们的项目中有其他用途 😄

添加 import

from pypinyin import pinyin, Style

改写 clean_text_inf 方法

def clean_text_inf(text, language):
    # 查找自定义多音字
    text, tone_data_list = find_custom_tone1(text)
    phones, word2ph, norm_text = clean_text(text, language)
    # 有自定义多音字列表的，需要修正多音字
    # @see https://github.com/RVC-Boss/GPT-SoVITS/issues/1175
    if len(tone_data_list) > 0:
        revise_custom_tone(phones, word2ph, tone_data_list)
    phones = cleaned_text_to_sequence(phones)
    return phones, word2ph, norm_text

新增方法 find_custom_tone1，get_initial_final 和 revise_custom_tone

def find_custom_tone1(text: str):
    """
    识别、提取文本中的多音字
    """
    tone_list = []
    txts = []
    # 识别 tone 标记，形如<tone as=shu4>数</tone>或<tone as=\"shu3\">数</tone>或<tone as=\"shù\">数</tone>
    ptn1 = re.compile(r"<tone.*?>(.*?)</tone>")
    # 清除 tone 标记中不需要的部分
    ptn2 = re.compile(r"(</?tone)|(as)|([>\"'\s=])")
    matches = list(re.finditer(ptn1, text))
    offset = 0
    for match in matches:
        # tone 标记之前的文本
        pre = text[offset:match.start()]
        txts.append(pre)
        # tone 标签中的单个多音字
        tone_text = match.group(1)
        txts.append(tone_text)
        # 提取读音，支持识别 Style.TONE 和  Style.TONE3
        tone = match.group(0)
        tone = re.sub(ptn2, "", tone)
        tone = tone.replace(tone_text, "")
        # 多音字在当前文本中的索引位置
        pos = sum([len(s) for s in txts])
        offset = match.end()
        init, final = get_initial_final(tone_text, tone)
        data = [tone, init, final, pos]
        tone_list.append(data)
    # 不能忘了最后一个 tone 标签后面可能还有剩余的内容
    if offset < len(text):
        txts.append(text[offset:])

    text = ''.join(str(i) for i in txts)
    text = text.replace(" ", "")  # 去除空格
    return text, tone_list

def get_initial_final(wd, py):
    """
    根据自定义的多音字读音匹配正确的声母、韵母。这里参考的 text/chinese.py 中的 _get_initials_finals 方法。
    """
    # 声母列表
    initials = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.INITIALS)[0]
    # 韵母列表
    finals_tone1 = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.FINALS_TONE)[0]
    finals_tone3 = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.FINALS_TONE3)[0]
    # 因为不知道用户端究竟传的是 TONE1 还是 TONE3 风格的，所以这里需要组合不同风格的声母、韵母去做对比
    for init in initials:
        # TONE1 风格
        for final in finals_tone1:
            compose = init + final
            if compose == py:
                return init, finals_tone3[finals_tone1.index(final)]
        # TONE3 风格
        for final in finals_tone3:
            compose = init + final
            if compose == py:
                return init, final
    # 无法匹配时，返回空以保持模型中默认读音
    return "", ""

def revise_custom_tone(phones, word2ph, tone_data_list):
    """
    修正自定义多音字
    """
    for td in tone_data_list:
        tone = td[0]
        init = td[1]
        final = td[2]
        pos = td[3]
        if init == "" and final == "":
            # 如果匹配拼音的时候失败，这里保持模型中默认提供的读音
            continue

        wd_pos = 0
        for i in range(0, pos):
            wd_pos += word2ph[i]
        org_init = phones[wd_pos - 2]
        org_final = phones[wd_pos - 1]
        phones[wd_pos - 2] = init
        phones[wd_pos - 1] = final
        print(f"[+]成功修改读音: {org_init}{org_final} => {tone}")

例子：

这个字的读音是<tone as="jué">角</tone>色，而不是<tone as="jiao3">角</tone>色

wav demo

感谢。如何自定义多音字列表呢？比如我就想让狗发音为mao1

yitenghao commented 2 weeks ago

感谢 @uone-1 ，在你的思路下，我做了一个改进版本的多音字的处理方式，符合你的要求 @juntaosun 。同样需要改写 api.py 或者 inference_webui.py 或者其他需要处理文字的地方，改写 clean_text_inf 方法，新增 find_custom_tone1 方法(为了区分 @uone-1 的 find_custom_tone 方法)、get_initial_final 方法和 revise_custom_tone 方法。使用方式：用自定义的 <tone/> 标签把单个字包起来，并用 as 属性标记出正确的读音，读音支持 pypinyin 的 TONE 和 TONE3 风格，自动识别。具体用法如下(例子试听 demo 在最后)：

他说他数<tone as=shu3>数</tone>不好，所以我就<tone as='jiāo'>教</tone>他怎么<tone as="shu4">数</tone>数。

为什么改用 tone 标签而不是继续用 {} 括号，是因为括号在我们的项目中有其他用途 😄

添加 import

from pypinyin import pinyin, Style

改写 clean_text_inf 方法

def clean_text_inf(text, language):
    # 查找自定义多音字
    text, tone_data_list = find_custom_tone1(text)
    phones, word2ph, norm_text = clean_text(text, language)
    # 有自定义多音字列表的，需要修正多音字
    # @see https://github.com/RVC-Boss/GPT-SoVITS/issues/1175
    if len(tone_data_list) > 0:
        revise_custom_tone(phones, word2ph, tone_data_list)
    phones = cleaned_text_to_sequence(phones)
    return phones, word2ph, norm_text

新增方法 find_custom_tone1，get_initial_final 和 revise_custom_tone

def find_custom_tone1(text: str):
    """
    识别、提取文本中的多音字
    """
    tone_list = []
    txts = []
    # 识别 tone 标记，形如<tone as=shu4>数</tone>或<tone as=\"shu3\">数</tone>或<tone as=\"shù\">数</tone>
    ptn1 = re.compile(r"<tone.*?>(.*?)</tone>")
    # 清除 tone 标记中不需要的部分
    ptn2 = re.compile(r"(</?tone)|(as)|([>\"'\s=])")
    matches = list(re.finditer(ptn1, text))
    offset = 0
    for match in matches:
        # tone 标记之前的文本
        pre = text[offset:match.start()]
        txts.append(pre)
        # tone 标签中的单个多音字
        tone_text = match.group(1)
        txts.append(tone_text)
        # 提取读音，支持识别 Style.TONE 和  Style.TONE3
        tone = match.group(0)
        tone = re.sub(ptn2, "", tone)
        tone = tone.replace(tone_text, "")
        # 多音字在当前文本中的索引位置
        pos = sum([len(s) for s in txts])
        offset = match.end()
        init, final = get_initial_final(tone_text, tone)
        data = [tone, init, final, pos]
        tone_list.append(data)
    # 不能忘了最后一个 tone 标签后面可能还有剩余的内容
    if offset < len(text):
        txts.append(text[offset:])

    text = ''.join(str(i) for i in txts)
    text = text.replace(" ", "")  # 去除空格
    return text, tone_list

def get_initial_final(wd, py):
    """
    根据自定义的多音字读音匹配正确的声母、韵母。这里参考的 text/chinese.py 中的 _get_initials_finals 方法。
    """
    # 声母列表
    initials = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.INITIALS)[0]
    # 韵母列表
    finals_tone1 = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.FINALS_TONE)[0]
    finals_tone3 = pinyin(wd, heteronym=True, neutral_tone_with_five=True, style=Style.FINALS_TONE3)[0]
    # 因为不知道用户端究竟传的是 TONE1 还是 TONE3 风格的，所以这里需要组合不同风格的声母、韵母去做对比
    for init in initials:
        # TONE1 风格
        for final in finals_tone1:
            compose = init + final
            if compose == py:
                return init, finals_tone3[finals_tone1.index(final)]
        # TONE3 风格
        for final in finals_tone3:
            compose = init + final
            if compose == py:
                return init, final
    # 无法匹配时，返回空以保持模型中默认读音
    return "", ""

def revise_custom_tone(phones, word2ph, tone_data_list):
    """
    修正自定义多音字
    """
    for td in tone_data_list:
        tone = td[0]
        init = td[1]
        final = td[2]
        pos = td[3]
        if init == "" and final == "":
            # 如果匹配拼音的时候失败，这里保持模型中默认提供的读音
            continue

        wd_pos = 0
        for i in range(0, pos):
            wd_pos += word2ph[i]
        org_init = phones[wd_pos - 2]
        org_final = phones[wd_pos - 1]
        phones[wd_pos - 2] = init
        phones[wd_pos - 1] = final
        print(f"[+]成功修改读音: {org_init}{org_final} => {tone}")

例子：

这个字的读音是<tone as="jué">角</tone>色，而不是<tone as="jiao3">角</tone>色

wav demo

感谢。如何自定义多音字列表呢？比如我就想让狗发音为mao1

看看pr 1728

RVC-Boss / GPT-SoVITS

自定義文字讀音 #1175