Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
`# -- coding:utf-8 --
import paddle
import yaml
import soundfile as sf
from yacs.config import CfgNode
from paddlespeech.t2s.frontend.mix_frontend import MixFrontend
from paddlespeech.t2s.exps.syn_utils import get_am_inference
from paddlespeech.t2s.exps.syn_utils import get_voc_inference
from paddlespeech.t2s.exps.syn_utils import run_frontend
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
`# -- coding:utf-8 -- import paddle import yaml import soundfile as sf from yacs.config import CfgNode from paddlespeech.t2s.frontend.mix_frontend import MixFrontend from paddlespeech.t2s.exps.syn_utils import get_am_inference from paddlespeech.t2s.exps.syn_utils import get_voc_inference from paddlespeech.t2s.exps.syn_utils import run_frontend
from memory_profiler import profile
sentence = """ 800多年前的今天,北京市丰台区的卢沟桥遭遇了一场猛烈的暴雨,但令人惊讶的是,这座古老的桥梁却安然无恙地屹立在这里,见证了时光的流转和历史的变迁。 卢沟桥是中国历史上著名的古桥之一,建于元代至正年间,距今已有800多年的历史。这座桥梁位于北京市丰台区卢沟镇境内,横跨于卢沟河之上,全长266.5米,宽9.3米,是一座典型的石拱桥。。 在800多年的历史中,卢沟桥经历了无数次风雨洗礼和战火炮烙,但每一次都能够奇迹般地幸免于难。其中最为著名的一次就是1937年的七七事变,日军侵华之际,卢沟桥成为了中日两军之间的战场。 在激战中,卢沟桥被毁坏殆尽,但在抗战胜利后经过修复后又恢复了原貌。而这次暴雨,则是近年来卢沟桥所遭遇的一次自然灾害。据当地居民介绍,这场暴雨是他们近几十年来见过的最大暴雨之一, 整个卢沟镇被淹没在水中,许多房屋和道路都被冲毁。但令人欣慰的是,卢沟桥并没有受到任何损害,仍然安然无恙地屹立在卢沟河之上。专家介绍说,卢沟桥之所以能够幸免于难, 主要是因为它在建造时就考虑到了防洪防涝的问题。在桥梁两侧设置了多个洪水泄洪口和堤坝,能够有效地控制卢沟河的水位和流量。此外,在修缮时也一直坚持使用传统工艺和材料, 保证了卢沟桥的稳固和耐久性。卢沟桥的安然无恙,不仅是一种幸运和奇迹,更是对我们传统文化和历史遗产的珍视和保护。我们应该更加重视和保护这些历史文化遗产,让它们能够继续传承下去, 成为我们民族文化的瑰宝和精神财富。 """
phones_dict = "./fastspeech2_mix_ckpt_1.2.0/phone_id_map.txt"
am_config_file = "./fastspeech2_mix_ckpt_1.2.0/default.yaml" am_ckpt = "./fastspeech2_mix_ckpt_1.2.0/snapshot_iter_99200.pdz" am_stat = "./fastspeech2_mix_ckpt_1.2.0/speech_stats.npy" speaker_dict = "./fastspeech2_mix_ckpt_1.2.0/speaker_id_map.txt"
voc_config_file = "./hifigan_aishell3_ckpt_0.2.0/default.yaml" voc_ckpt = "./hifigan_aishell3_ckpt_0.2.0/snapshot_iter_2500000.pdz" voc_stat = "./hifigan_aishell3_ckpt_0.2.0/feats_stats.npy"
voc_config_file = "./pwg_aishell3_ckpt_0.5/default.yaml"
voc_ckpt = "./pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz"
voc_stat = "./pwg_aishell3_ckpt_0.5/feats_stats.npy"
text frontend
@profile(precision=5)
def text_frontend(): frontend = MixFrontend(phone_vocab_path=phones_dict) print("frontend done!") return frontend
with open(am_config_file) as f: am_config = CfgNode(yaml.safe_load(f))
load AM
@profile(precision=5)
def load_am(): am_inference = get_am_inference( am="fastspeech2_mix", am_config=am_config, am_ckpt=am_ckpt, am_stat=am_stat, phones_dict=phones_dict, tones_dict=None, speaker_dict=speaker_dict) print("acoustic model done!") return am_inference
with open(voc_config_file) as f: voc_config = CfgNode(yaml.safe_load(f))
load Voc
@profile(precision=5)
def load_voc(): voc_inference = get_voc_inference( voc="hifigan_aishell3", voc_config=voc_config, voc_ckpt=voc_ckpt, voc_stat=voc_stat)
voc_inference = get_voc_inference(
get phone id
@profile(precision=5)
def get_phone_id(): frontend = text_frontend()
phone_ids = get_phone_id() am_inference = load_am() voc_inference = load_voc()
inference
flags = 0 print(f"len phone ids : {len(phone_ids)}") print(phone_ids) for i in range(len(phone_ids)): part_phone_ids = phone_ids[i] spk_id = 174 # baker:174, ljspeech:175, aishell3:0~173, vctk:176~282 spk_id = paddle.to_tensor(spk_id) tracemalloc.start() with paddle.no_grad(): mel = am_inference(part_phone_ids, spk_id) wav = voc_inference(mel)
print("infer successfully.")
save audio
wav = wav_all.numpy() sf.write("./out.wav", wav, am_config.fs) print("write successfully.") `
执行转译任务时,文字越多占用内存越高,1200多字算上加载模型大概能占用快到16G,且随着字数增多,内存仍在上涨。 想请问下,是我使用的问题吗,还是它本来就是这种表现