Open yt605155624 opened 2 years ago
./run.sh --stage 0 --stop-stage 5
check oov
get mfa result
sh: 1: mfa_align: Exec format error
generate durations.txt
extract feature
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data] [Errno 111] Connection refused>
[nltk_data] Error loading cmudict: <urlopen error [Errno 111]
[nltk_data] Connection refused>
196 1
100%|███████████████████████████████████████████████████████████████████████████████████| 196/196 [00:00<00:00, 5146.26it/s]
Done
Traceback (most recent call last):
File "local/extract_feature.py", line 346, in
The code in File "/home/nx/study/python/Paddle24/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 45: self.data = data assert len(data) > 0, "This dataset has no examples"
@UserName-wang follow https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md to download nltk_data to your ${HOME}
按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练
运行run_mix.sh提示如下错误:
root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh
check oov
get mfa result
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 12.0
/root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Calculating CMVN...
Number of speakers in corpus: 1, average number of utterances per speaker: 12.0
Done with setup.
100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it]
Done! Everything took 6.651328802108765 seconds
generate durations.txt
Traceback (most recent call last):
File "local/generate_duration.py", line 38, in
使用Python 3.8版本
按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练
运行run_mix.sh提示如下错误: root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh check oov get mfa result align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 /root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Creating dictionary information... Setting up training data... Calculating MFCCs... Calculating CMVN... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 Done with setup. 100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it] Done! Everything took 6.651328802108765 seconds generate durations.txt Traceback (most recent call last): File "local/generate_duration.py", line 38, in gen_duration_from_textgrid(mfa_dir, duration_file, fs, n_shift) File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 76, in gen_duration_from_textgrid durations_dict[name] = (speaker, readtg( File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 29, in readtg for interval in alignment.tierDict["phones"].entryList: AttributeError: 'Textgrid' object has no attribute 'tierDict'
使用Python 3.8版本
我用时的3.7.9一样问题,请问解决了把,ubuntu22
按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练 运行run_mix.sh提示如下错误: root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh check oov get mfa result align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 /root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Creating dictionary information... Setting up training data... Calculating MFCCs... Calculating CMVN... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 Done with setup. 100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it] Done! Everything took 6.651328802108765 seconds generate durations.txt Traceback (most recent call last): File "local/generate_duration.py", line 38, in gen_duration_from_textgrid(mfa_dir, duration_file, fs, n_shift) File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 76, in gen_duration_from_textgrid durations_dict[name] = (speaker, readtg( File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 29, in readtg for interval in alignment.tierDict["phones"].entryList: AttributeError: 'Textgrid' object has no attribute 'tierDict' 使用Python 3.8版本
我用时的3.7.9一样问题,请问解决了把,ubuntu22
看下praatio的版本是不是5.0.0
@zhouzyc @maize-j 可能是 praatio 的不兼容升级导致的 https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md#version-5-to-6-migration
@zhouzyc @maize-j 可能是 praatio 的不兼容升级导致的 https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md#version-5-to-6-migration
是的,现在安装的时候praatio默认是6.0.0,版本没有向下兼容,就会出现这个问题,改回5.0.0就好了
@zhouzyc @maize-j 可能是 praatio 的不兼容升级导致的 https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md#version-5-to-6-migration
fixed by https://github.com/PaddlePaddle/PaddleSpeech/pull/2970
在docker里,get_frontend有一步是下载文件,589MB的,估计是bert的ckpt吧,每次进镜像都要重新下载,项目里实在是没找到相关代码,请问这个589m的文件是从哪里下的,有什么作用,放到哪里?我好本地下载一下,挂载进去,不要再每次都下载了。。
在docker里,get_frontend有一步是下载文件,589MB的,估计是bert的ckpt吧,每次进镜像都要重新下载,项目里实在是没找到相关代码,请问这个589m的文件是从哪里下的,有什么作用,放到哪里?我好本地下载一下,挂载进去,不要再每次都下载了。。
已解决,挂载docker里/root/下的三个文件夹,nltk_data、.paddlenlp、.paddlespeech 这个589MB的是G2PWModel_1.1.zip,不可只保留G2PWModel_1.1/删zip,删了会重下。。。
./run.sh --stage 0 --stop-stage 5
check oov
get mfa result
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 688.0
Creating dictionary information...
Setting up corpus_data directory...
Generating base features (mfcc)...
Calculating CMVN...
Done with setup.
There were 1 segments/files not aligned. Please see ./mfa_result/unaligned.txt for more details on why alignment failed for these files.
Done! Everything took 53.481459617614746 seconds
generate durations.txt
extract feature
686 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 686/686 [00:00<00:00, 8198.77it/s]Done
Traceback (most recent call last):
File "local/extract_feature.py", line 346, in
(venv) ant@DESKTOP-MEKU9AN:/mnt/d/voice/PaddleSpeech/examples/other/tts_finetune/tts3$ ls ~/nltk_data/ corpora taggers
@UserName-wang follow https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md to download nltk_data to your ${HOME}
我这个已经下载nltk_data到home目录了还是提示这个错误,是什么原因呢?
如果 12 句 finetune 效果不佳,一般是因为数据集太小了,建议增加数据集,一般是 300 ~ 600 条,数据量和质量越好,合成的效果越好 数据的质量要求没有混响,没有杂音,离麦克风距离适中,具体可以参考标贝的数据质量。 finetune 出来的音色与 目标说话人和原始说话人的相似度有关,即目标说话人和原始说话人相似度越高,finetune 出来的音色更接近目标说话人。 finetune 出来的音频质量与原始说话人的音频质量有关,原始说话人的音频质量不好,finetune 出来的效果也可能不好。 综上,finetune 方案在数据采集,选择原始说话人上需要好好选择。
小样本 finetune 原理参考 关于训练一个自己的TTS模型