2noise / ChatTTS

A generative speech model for daily dialogue.
https://2noise.com
GNU Affero General Public License v3.0
32.75k stars 3.55k forks source link

Always a different sounding output #731

Open kevincobain2000 opened 3 months ago

kevincobain2000 commented 3 months ago

Previous issue closed by the bot: https://github.com/2noise/ChatTTS/issues/196

With same params, the output sounds are completely different for same speaker.

import ChatTTS
import soundfile
from tools.audio import load_audio

chat = ChatTTS.Chat()
chat.load(compile=True)

woman_speaker = "蘁淰敡欀揄彂縠潮倕啛诊卬嚧梵欘缵嬜莶媆臛厛剆帅瞗琦忾昒帎獂懖睃埍蛿呫拪瑱半孚眒眄豣燘叕蝛纑罕絚莈業哚跎毙區娺伺廭蘌沚垸炻糀喟攆噃狉蛯訐倒豱暍襫敥炕熔勡犕虩貏群朤扃琫芾滩昷旟眊捚洚獈彶仢猷莣彙緅蒷携笺倹倿僜墮箧蒒實壅纓茔但拃護殈畴睠昝叾灱蕬絟尅籔蠰毚洮君嶽昵唺挒紵焾譙垷晊茷蘴堉胑衬凹兪瑍稦嵫朋芉桉起汐涔揮傄彠焅豘曶曍桦譵清蛃穁刁笎嚔傩悘愴漺諩繑糒凊偎蛬緟狍蔜瓃庝結殮橤蒫巈讃椑嬏棝续虹媫衳爣沐蕢穀墩头伟煠洆叔耏嚉珟芾奇瓔荂翵蚖癥腅綆缌皼劣圾跌婀糙悕瀌裂珦美菻弼氒於灤莳恥唤垬妭莛愊氫劈街蘗腣綶狽扽甮貹峀葔绁篡歀婂愩荀疪些敭媾苂庾贻孠篗码凒簲楰族乶塙疀盲橿潽杅窓耀硅硴舻盖攕缜貉卛愀覄潣伐佶栳猔箛夼氟匎匿椖槱婏豈臎姃撢諅滰洼谵畞纓瑜姵椭梗瑟戣噣蒯橀愯娴褪搘箸蠱罈娹昀笺掐胱窘穴詡堇梿岙滫孡譧屼拐櫁紣涥坤礓佌妊滉皵慑璐扳畐亴肳樂婛笧撉愞罿剖殫炳呚盔亂秄睔臺湜櫛施耕禌倅痜淡僧堚爐悾罐箻桵笞葉嚳吷媞爍偹蓆懼溟器乸淼聇礮攧縧檈焪牠坶乧幡棚葜櫡倹珊蝱僂劸旣巙艵臛伧抝做佧祽奭惲拌赦暉倄垑蠷炋姱徐菢岭燓籨蓗俰朁习搉槳堆蓠导曒搚爃瀘衱奔篺映嫣媒裘涊蒈慑窥掝娻恹檥喥蘓獋媆灈祏怃娊蜏沲煱澬泘氖斝缺汎殑棼碏箝岀蜽朂慊呪狹穠檽昇羙缾紧簚甂箤乆羨桼葰暽嘲呫寎廠凢準秘庱蛭绎奉历厊爘偏蚠蛬蒭槥蛹摰覒溅拦蒀宆惧曡瑩硞蜧罚祔嚳嫬莑倡谉咪穸宾聰燽厣皚妲抦蛑伫尡蘳宷岵廅埣枞瀿籿褲壸疷働稤徫皎澐冞爦宠榤奦籘碑絟牍瘪繽裶朾豟蒎嬊呖擇興曲澭艆萈語巓婥慉堕绘張苨乎墦喡沍絜毓掷廉諳毑崓罢虵涍叜臮捝婾讛儷焴茄築挋巎脸杲嚄嫆蛋袣芮贆璸搓蔎菸懩孶杇橺炣屽劉渘诂涀誨偬聞痯倷厎呖薒繖脳栴纀樂梖樜葘攀獉柵亞恹召獨曭櫆诣惜艢糏塡槪簹岙涋設推眔乐夕穲忚櫫區籄诤倘儒畢倱咨別秭褸熘撥粼埠彆茝嫜脢湽譀蘦肸秚珟癄耖艴嬹搳嵀揠畄圫捅丨奿絽譺尒溢奌欼咓崪薱舡讇衞詈篧嵣稗妔糣咗珚岰浪傷烶晾匂舱愪膽脜攖狦歞杘烝疮潀搿茏赓譩漺圳俖巽点賑棏喀巫稲趿科嵺蕡楠奼货吙淝灍杵蘇斱愽俜樿葎壠舦屁攲璙舴涽旀矞睚癩浿妍谏怵蟾莵橠弞蜵艿彟慿奐譈涗聟籹绱媐篻椺潃誋櫝朅泺矗礓虃翂莇蟋傾璪褖咝焯厀棄尹嗟屄一㴆"

rand_spk = chat.sample_random_speaker()
rand_spk = woman_speaker
print(rand_spk)

params_infer_code = ChatTTS.Chat.InferCodeParams(
    spk_emb = rand_spk, # add sampled speaker
    temperature = .3,   # using custom temperature
    top_P = 0.1,        # top P decode
    top_K = 1,         # top K decode
)

###################################
# For sentence level manual control.

# use oral_(0-9), laugh_(0-2), break_(0-7)
# to generate special token in text to synthesize.
params_refine_text = ChatTTS.Chat.RefineTextParams(
    prompt='[oral_2][laugh_0][break_6]',
)

###################################
# For word level manual control.

text = """Welcome to the unspoken truth."""

wavs = chat.infer(text, skip_refine_text=False,  params_refine_text=params_refine_text, params_infer_code=params_infer_code)

soundfile.write("outputs/output1.wav", wavs[0], 24000)

text = """This should be same as the previous one."""

wavs = chat.infer(text, skip_refine_text=False,  params_refine_text=params_refine_text, params_infer_code=params_infer_code)

soundfile.write("outputs/output2.wav", wavs[0], 24000)
kevincobain2000 commented 3 months ago

Sorry, after storing the embed in pickle it seems to generate somewhat same. The issue however exists on the webui.

kevincobain2000 commented 3 months ago

Neh, fixed speaker doesn't work, sometimes changes from girl to guy.

Joseph513shen commented 2 months ago

i meet the same problem,i seems to be fixed voice of man and have no way to turn to woman voice,even i used the rand_spk in your code

haydonryan commented 4 weeks ago

This is still a huge problem - basically makes it useless for larger globs of text.