Closed sixyang closed 2 years ago
发现已经有了文档, AM finetune文档 VOC finetune 文档, 文档 非常感谢!
遇到一个问题,是 mfa_align 的,报错如下所示:
root@container-49581189ae-dcc2b933:~/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3# ./run.sh
/root/miniconda3/lib/python3.8/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype=np.complex,
mfa_align input/csmsc_mini/newdir tools/aligner/simple.lexicon tools/aligner/aishell3_model.zip mfa_result
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 198.0
/root/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Traceback (most recent call last):
File "aligner/command_line/align.py", line 186, in <module>
File "aligner/command_line/align.py", line 142, in validate_args
File "aligner/command_line/align.py", line 94, in align_corpus
File "aligner/aligner/pretrained.py", line 74, in __init__
File "aligner/aligner/pretrained.py", line 122, in setup
File "aligner/aligner/base.py", line 89, in setup
File "aligner/corpus.py", line 979, in initialize_corpus
File "aligner/corpus.py", line 852, in create_mfccs
File "aligner/corpus.py", line 863, in _combine_feats
FileNotFoundError: [Errno 2] No such file or directory: '/root/Documents/MFA/newdir/train/mfcc/raw_mfcc.0.scp'
[54706] Failed to execute script align
159 20
100%|██████████████████████████████████████████████████████████████████████████| 159/159 [00:00<00:00, 18030.51it/s]
Done
Traceback (most recent call last):
File "finetune.py", line 179, in <module>
extract_feature(duration_file, config, input_dir, dump_dir,
File "/root/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3/local/extract.py", line 251, in extract_feature
normalize(speech_scaler, pitch_scaler, energy_scaler, vocab_phones,
File "/root/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3/local/extract.py", line 141, in normalize
dataset = DataTable(
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/datasets/data_table.py", line 45, in __init__
assert len(data) > 0, "This dataset has no examples"
AssertionError: This dataset has no examples
定位到问题是计算 MFCC 时出问题,从网上搜索得到解法(参考 149 和 91)。
rm tts3/tools/montreal-forced-aligner/lib/thirdparty/bin/libopenblas.so.0
sudo apt install libopenblas-dev
即不要使用原来自带的 libopenblas.so.0,安装最新的包。
发现已经有了文档, AM finetune文档 VOC finetune 文档, 文档 非常感谢!
此处 voc finetune 并不是指用自己的数据集 finetune,而是 hifigan 论文中提到的用 am 生成的 mel finetune
发现已经有了文档, AM finetune文档 VOC finetune 文档, 文档 非常感谢!
此处 voc finetune 并不是指用自己的数据集 finetune,而是 hifigan 论文中提到的用 am 生成的 mel finetune
那加入自己的数据进行 finetune 就直接用 AM 的那个文档就够了是吗?
我加入自己数据进行的时候报了这个错误,请问是什么问题啊?
root@container-49581189ae-dcc2b933:~/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3# ./run.sh
/root/miniconda3/lib/python3.8/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype=np.complex,
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 16.0
/root/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Calculating CMVN...
Number of speakers in corpus: 1, average number of utterances per speaker: 16.0
Done with setup.
100%|#######################################################################################################| 2/2 [00:02<00:00, 1.28s/it]
Done! Everything took 11.016074180603027 seconds
13 2
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:02<00:00, 4.51it/s]
Done
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 601.37it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.51it/s]
Done
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 337.56it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9.54it/s]
Done
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 409.64it/s]
rank: 0, pid: 325888, parent_pid: 325882
multiple speaker fastspeech2!
spk_num: 174
samplers done!
dataloaders done!
vocab_size: 306
W0828 18:51:54.657121 325888 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.5, Runtime API Version: 11.2
W0828 18:51:54.660507 325888 device_context.cc:465] device: 0, cuDNN Version: 8.1.
model done!
optimizer done!
Exception in main training loop:
Traceback (most recent call last):
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/training/trainer.py", line 149, in run
update()
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/training/updaters/standard_updater.py", line 107, in update
batch = self.read_batch()
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/training/updaters/standard_updater.py", line 180, in read_batch
batch = next(self.train_iterator)
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 722, in __next__
six.reraise(*sys.exc_info())
File "/root/miniconda3/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 697, in __next__
data = self._reader.read_next_var_list()
Trainer extensions will try to handle the extension. Then all extensions will finalize.Traceback (most recent call last):
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/training/updaters/standard_updater.py", line 177, in read_batch
batch = next(self.train_iterator)
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 722, in __next__
six.reraise(*sys.exc_info())
File "/root/miniconda3/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 697, in __next__
data = self._reader.read_next_var_list()
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "finetune.py", line 194, in <module>
train_sp(train_args, config)
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/exps/fastspeech2/train.py", line 165, in train_sp
trainer.run()
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/training/trainer.py", line 198, in run
six.reraise(*exc_info)
File "/root/miniconda3/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/training/trainer.py", line 149, in run
update()
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/training/updaters/standard_updater.py", line 107, in update
batch = self.read_batch()
File "/root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/training/updaters/standard_updater.py", line 180, in read_batch
batch = next(self.train_iterator)
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 722, in __next__
six.reraise(*sys.exc_info())
File "/root/miniconda3/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 697, in __next__
data = self._reader.read_next_var_list()
StopIteration
我的数据是自己用麦克风录制的,然后用 ffmpeg 从 m4a 转换成了 wav,frame_rate 和 aishell3 一样是 44.1khz。 受限于只能传输 25MB 文件,这里提供百度网盘的链接(https://pan.baidu.com/s/1fW_zLvWFu4u61QK_j5dxfA 提取码: 64gu)
跑这个的时候先用你们的 cscms_mini 数据跑了一下流程,是通的,但再跑自己的数据就不行了。我删除了 cscms 的几条数据再跑,还是正常运行,就推测不是缓存的问题。 感谢!
尝试一下把batch_size 改小点,改成4,因为你的数据只有10几条。默认的是batch_size 是64。
尝试一下把batch_size 改小点,改成4,因为你的数据只有10几条。默认的是batch_size 是64。
可行的!非常感谢!
请问,我明明设置的 stop_stage 比较高,但是根本就不跑训练流程,这是为什么啊? stop_stage 调为多少都没用,加载的模型是自己训练过一点的 snapshot_iter_96800.pdz,原来你们提供的是 snapshot_iter_96400.pdz
root@container-49581189ae-dcc2b933:~/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3# ./run.sh
/root/miniconda3/lib/python3.8/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype=np.complex,
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 83.0
/root/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Calculating CMVN...
Number of speakers in corpus: 1, average number of utterances per speaker: 83.0
Done with setup.
100%|#####################################################################################################| 2/2 [00:04<00:00, 2.13s/it]
Done! Everything took 22.48857855796814 seconds
67 9
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:08<00:00, 7.87it/s]
Done
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:00<00:00, 579.46it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 9.17it/s]
Done
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 185.14it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.90it/s]
Done
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 519.07it/s]
rank: 0, pid: 116695, parent_pid: 116692
multiple speaker fastspeech2!
spk_num: 174
samplers done!
dataloaders done!
vocab_size: 306
W0830 16:50:32.274386 116695 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.5, Runtime API Version: 11.2
W0830 16:50:32.277689 116695 device_context.cc:465] device: 0, cuDNN Version: 8.1.
model done!
optimizer done!
in hifigan syn_e2e
/root/miniconda3/lib/python3.8/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype=np.complex,
========Args========
am: fastspeech2_aishell3
am_ckpt: ./exp/default/checkpoints/snapshot_iter_97800.pdz
am_config: ./pretrained_models/fastspeech2_aishell3_ckpt_1.1.0/default.yaml
am_stat: ./pretrained_models/fastspeech2_aishell3_ckpt_1.1.0/speech_stats.npy
inference_dir: null
lang: zh
ngpu: 1
output_dir: ./test_e2e
phones_dict: ./dump/phone_id_map.txt
speaker_dict: ./dump/speaker_id_map.txt
spk_id: 0
text: /root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/exps/fastspeech2/../sentences.txt
tones_dict: null
voc: hifigan_aishell3
voc_ckpt: pretrained_models/hifigan_aishell3_ckpt_0.2.0/snapshot_iter_2500000.pdz
voc_config: pretrained_models/hifigan_aishell3_ckpt_0.2.0/default.yaml
voc_stat: pretrained_models/hifigan_aishell3_ckpt_0.2.0/feats_stats.npy
请问,我明明设置的 stop_stage 比较高,但是根本就不跑训练流程,这是为什么啊? stop_stage 调为多少都没用,加载的模型是自己训练过一点的 snapshot_iter_96800.pdz,原来你们提供的是 snapshot_iter_96400.pdz
root@container-49581189ae-dcc2b933:~/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3# ./run.sh /root/miniconda3/lib/python3.8/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations[](https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations) dtype=np.complex, align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load[](https://msg.pyyaml.org/load) for full details. Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 83.0 /root/autodl-tmp/PaddleSpeech-develop/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load[](https://msg.pyyaml.org/load) for full details. Creating dictionary information... Setting up training data... Calculating MFCCs... Calculating CMVN... Number of speakers in corpus: 1, average number of utterances per speaker: 83.0 Done with setup. 100%|#####################################################################################################| 2/2 [00:04<00:00, 2.13s/it] Done! Everything took 22.48857855796814 seconds 67 9 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:08<00:00, 7.87it/s] Done 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:00<00:00, 579.46it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 9.17it/s] Done 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 185.14it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.90it/s] Done 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 519.07it/s] rank: 0, pid: 116695, parent_pid: 116692 multiple speaker fastspeech2! spk_num: 174 samplers done! dataloaders done! vocab_size: 306 W0830 16:50:32.274386 116695 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.5, Runtime API Version: 11.2 W0830 16:50:32.277689 116695 device_context.cc:465] device: 0, cuDNN Version: 8.1. model done! optimizer done! in hifigan syn_e2e /root/miniconda3/lib/python3.8/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations[](https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations) dtype=np.complex, ========Args======== am: fastspeech2_aishell3 am_ckpt: ./exp/default/checkpoints/snapshot_iter_97800.pdz am_config: ./pretrained_models/fastspeech2_aishell3_ckpt_1.1.0/default.yaml am_stat: ./pretrained_models/fastspeech2_aishell3_ckpt_1.1.0/speech_stats.npy inference_dir: null lang: zh ngpu: 1 output_dir: ./test_e2e phones_dict: ./dump/phone_id_map.txt speaker_dict: ./dump/speaker_id_map.txt spk_id: 0 text: /root/autodl-tmp/PaddleSpeech-develop/paddlespeech/t2s/exps/fastspeech2/../sentences.txt tones_dict: null voc: hifigan_aishell3 voc_ckpt: pretrained_models/hifigan_aishell3_ckpt_0.2.0/snapshot_iter_2500000.pdz voc_config: pretrained_models/hifigan_aishell3_ckpt_0.2.0/default.yaml voc_stat: pretrained_models/hifigan_aishell3_ckpt_0.2.0/feats_stats.npy
找到问题了,参数 epoch 默认设置的 100,这里调高一下就可以了。但还是很好奇,按道理这应该是接着前面的预训练模型接着训练,即 stage 和 stop_stage 应该可以影响训练进程,但这里还需要额外加上 epoch。
原因是你 snapshot_iter_96800.pdz 里面的 epoch 数已经达到要求了(大于 default.yaml 里面的 epoch + args.epoch) ,程序认为训练完成了 https://github.com/PaddlePaddle/PaddleSpeech/blob/e147b96cf08df04f079105377d2348933dec5f0b/examples/other/tts_finetune/tts3/finetune.py#L150
可以 paddle.load() snapshot_iter_96800.pdz 和 snapshot_iter_96400.pdz 看看 'ckpt' 的值
原因是你 snapshot_iter_96800.pdz 里面的 epoch 数已经达到要求了(大于 default.yaml 里面的 epoch + args.epoch) ,程序认为训练完成了
可以 paddle.load() snapshot_iter_96800.pdz 和 snapshot_iter_96400.pdz 看看 'ckpt' 的值
好的,谢谢!
如题,tts 中,我想给 aishell3 数据里额外添加一些数据来进行训练(采样率相同),对于 am 和 voc,请问我除了需要 '文本内容' 和 '音频数据' 外,还需要其他东西吗?我看到其他 issue 里面说,直接给 aishell3 的数据里面加一个 speaker_id 即可,那除此之外的步骤能大概描述一下吗?非常感谢!