PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
11.2k stars 1.86k forks source link

TTS中英文内存使用过高,应该怎么解决 #3469

Open mrg79433283 opened 1 year ago

mrg79433283 commented 1 year ago

`# -- coding:utf-8 -- import paddle import yaml import soundfile as sf from yacs.config import CfgNode from paddlespeech.t2s.frontend.mix_frontend import MixFrontend from paddlespeech.t2s.exps.syn_utils import get_am_inference from paddlespeech.t2s.exps.syn_utils import get_voc_inference from paddlespeech.t2s.exps.syn_utils import run_frontend

from memory_profiler import profile

sentence = """ 800多年前的今天,北京市丰台区的卢沟桥遭遇了一场猛烈的暴雨,但令人惊讶的是,这座古老的桥梁却安然无恙地屹立在这里,见证了时光的流转和历史的变迁。 卢沟桥是中国历史上著名的古桥之一,建于元代至正年间,距今已有800多年的历史。这座桥梁位于北京市丰台区卢沟镇境内,横跨于卢沟河之上,全长266.5米,宽9.3米,是一座典型的石拱桥。。 在800多年的历史中,卢沟桥经历了无数次风雨洗礼和战火炮烙,但每一次都能够奇迹般地幸免于难。其中最为著名的一次就是1937年的七七事变,日军侵华之际,卢沟桥成为了中日两军之间的战场。 在激战中,卢沟桥被毁坏殆尽,但在抗战胜利后经过修复后又恢复了原貌。而这次暴雨,则是近年来卢沟桥所遭遇的一次自然灾害。据当地居民介绍,这场暴雨是他们近几十年来见过的最大暴雨之一, 整个卢沟镇被淹没在水中,许多房屋和道路都被冲毁。但令人欣慰的是,卢沟桥并没有受到任何损害,仍然安然无恙地屹立在卢沟河之上。专家介绍说,卢沟桥之所以能够幸免于难, 主要是因为它在建造时就考虑到了防洪防涝的问题。在桥梁两侧设置了多个洪水泄洪口和堤坝,能够有效地控制卢沟河的水位和流量。此外,在修缮时也一直坚持使用传统工艺和材料, 保证了卢沟桥的稳固和耐久性。卢沟桥的安然无恙,不仅是一种幸运和奇迹,更是对我们传统文化和历史遗产的珍视和保护。我们应该更加重视和保护这些历史文化遗产,让它们能够继续传承下去, 成为我们民族文化的瑰宝和精神财富。 """

phones_dict = "./fastspeech2_mix_ckpt_1.2.0/phone_id_map.txt"

am_config_file = "./fastspeech2_mix_ckpt_1.2.0/default.yaml" am_ckpt = "./fastspeech2_mix_ckpt_1.2.0/snapshot_iter_99200.pdz" am_stat = "./fastspeech2_mix_ckpt_1.2.0/speech_stats.npy" speaker_dict = "./fastspeech2_mix_ckpt_1.2.0/speaker_id_map.txt"

voc_config_file = "./hifigan_aishell3_ckpt_0.2.0/default.yaml" voc_ckpt = "./hifigan_aishell3_ckpt_0.2.0/snapshot_iter_2500000.pdz" voc_stat = "./hifigan_aishell3_ckpt_0.2.0/feats_stats.npy"

voc_config_file = "./pwg_aishell3_ckpt_0.5/default.yaml"

voc_ckpt = "./pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"

voc_stat = "./pwg_aishell3_ckpt_0.5/feats_stats.npy"

text frontend

@profile(precision=5)

def text_frontend(): frontend = MixFrontend(phone_vocab_path=phones_dict) print("frontend done!") return frontend

with open(am_config_file) as f: am_config = CfgNode(yaml.safe_load(f))

load AM

@profile(precision=5)

def load_am(): am_inference = get_am_inference( am="fastspeech2_mix", am_config=am_config, am_ckpt=am_ckpt, am_stat=am_stat, phones_dict=phones_dict, tones_dict=None, speaker_dict=speaker_dict) print("acoustic model done!") return am_inference

with open(voc_config_file) as f: voc_config = CfgNode(yaml.safe_load(f))

load Voc

@profile(precision=5)

def load_voc(): voc_inference = get_voc_inference( voc="hifigan_aishell3", voc_config=voc_config, voc_ckpt=voc_ckpt, voc_stat=voc_stat)

voc_inference = get_voc_inference(

#     voc="pwgan_aishell3",
#     voc_config=voc_config,
#     voc_ckpt=voc_ckpt,
#     voc_stat=voc_stat)
print("voc done!")
return voc_inference

get phone id

@profile(precision=5)

def get_phone_id(): frontend = text_frontend()

frontend_dict = run_frontend(
    frontend=frontend,
    text=sentence,
    merge_sentences=False,
    get_tone_ids=False,
    lang="mix")
phone_ids = frontend_dict['phone_ids']
return phone_ids

phone_ids = get_phone_id() am_inference = load_am() voc_inference = load_voc()

inference

flags = 0 print(f"len phone ids : {len(phone_ids)}") print(phone_ids) for i in range(len(phone_ids)): part_phone_ids = phone_ids[i] spk_id = 174 # baker:174, ljspeech:175, aishell3:0~173, vctk:176~282 spk_id = paddle.to_tensor(spk_id) tracemalloc.start() with paddle.no_grad(): mel = am_inference(part_phone_ids, spk_id) wav = voc_inference(mel)

if flags == 0:
    wav_all = wav
    flags = 1
else:
    wav_all = paddle.concat([wav_all, wav])
tracemalloc.stop()

print("infer successfully.")

save audio

wav = wav_all.numpy() sf.write("./out.wav", wav, am_config.fs) print("write successfully.") `

执行转译任务时,文字越多占用内存越高,1200多字算上加载模型大概能占用快到16G,且随着字数增多,内存仍在上涨。 想请问下,是我使用的问题吗,还是它本来就是这种表现

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.