PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
10.94k stars 1.83k forks source link

是否可以使用 paddlespeech 克隆英语语音? 我有一个说英语的人,我想用她的声音做 tts。 #3309

Open arnav-newzera opened 1 year ago

arnav-newzera commented 1 year ago

是否可以使用 paddlespeech 克隆英语语音? 我有一个说英语的人,我想用她的声音做 tts。

Is it possible to clone a english voice using paddlespeech? I have a english speaker, i want to do tts using her voice.

zh794390558 commented 1 year ago

可以尝试用tts en的模型做finetune , 参看https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/other/tts_finetune/tts3/README.md

arnav-newzera commented 1 year ago

感谢您的回复。我现在就试试。

arnav-newzera commented 1 year ago

我尝试了你所说的,但我在这个阶段遇到了错误:extract feature

/bin/bash: /home/newzera/anaconda3/envs/paddlespeech/lib/libtinfo.so.6: no version information available (required by /bin/bash)
check oov
get mfa result
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 602.0
/mnt/msd/users/arnav/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Traceback (most recent call last):
  File "aligner/command_line/align.py", line 186, in <module>
  File "aligner/command_line/align.py", line 142, in validate_args
  File "aligner/command_line/align.py", line 94, in align_corpus
  File "aligner/aligner/pretrained.py", line 74, in __init__
  File "aligner/aligner/pretrained.py", line 122, in setup
  File "aligner/aligner/base.py", line 89, in setup
  File "aligner/corpus.py", line 979, in initialize_corpus
  File "aligner/corpus.py", line 852, in create_mfccs
  File "aligner/corpus.py", line 863, in _combine_feats
FileNotFoundError: [Errno 2] No such file or directory: '/home/arnav/Documents/MFA/newdir/train/mfcc/raw_mfcc.0.scp'
[68599] Failed to execute script align
generate durations.txt
extract feature
/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
600 1
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 600/600 [00:00<00:00, 67646.43it/s]
Done
Traceback (most recent call last):
  File "local/extract_feature.py", line 352, in <module>
    replace_spkid=args.replace_spkid)
  File "local/extract_feature.py", line 267, in extract_feature
    vocab_speaker, dump_dir, "train")
  File "local/extract_feature.py", line 160, in normalize
    "energy": np.load,
  File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 47, in __init__
    assert len(data) > 0, "This dataset has no examples"
AssertionError: This dataset has no examples

我查看了问题中的各种解决方案(小数据集是一个,但我的数据集有 1000+ wavs),(转换为 16 位,48khz)但错误仍然存​​在。

zh794390558 commented 1 year ago

librosa==0.8.1 numpy==1.23.5

arnav-newzera commented 1 year ago

我安装了它,但它仍然显示相同的错误。

NLPerxue commented 1 year ago

我尝试了你所说的,但我在这个阶段遇到了错误:extract feature

/bin/bash: /home/newzera/anaconda3/envs/paddlespeech/lib/libtinfo.so.6: no version information available (required by /bin/bash)
check oov
get mfa result
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 602.0
/mnt/msd/users/arnav/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Traceback (most recent call last):
  File "aligner/command_line/align.py", line 186, in <module>
  File "aligner/command_line/align.py", line 142, in validate_args
  File "aligner/command_line/align.py", line 94, in align_corpus
  File "aligner/aligner/pretrained.py", line 74, in __init__
  File "aligner/aligner/pretrained.py", line 122, in setup
  File "aligner/aligner/base.py", line 89, in setup
  File "aligner/corpus.py", line 979, in initialize_corpus
  File "aligner/corpus.py", line 852, in create_mfccs
  File "aligner/corpus.py", line 863, in _combine_feats
FileNotFoundError: [Errno 2] No such file or directory: '/home/arnav/Documents/MFA/newdir/train/mfcc/raw_mfcc.0.scp'
[68599] Failed to execute script align
generate durations.txt
extract feature
/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
600 1
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 600/600 [00:00<00:00, 67646.43it/s]
Done
Traceback (most recent call last):
  File "local/extract_feature.py", line 352, in <module>
    replace_spkid=args.replace_spkid)
  File "local/extract_feature.py", line 267, in extract_feature
    vocab_speaker, dump_dir, "train")
  File "local/extract_feature.py", line 160, in normalize
    "energy": np.load,
  File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 47, in __init__
    assert len(data) > 0, "This dataset has no examples"
AssertionError: This dataset has no examples

我查看了问题中的各种解决方案(小数据集是一个,但我的数据集有 1000+ wavs),(转换为 16 位,48khz)但错误仍然存​​在。 建议先用官方给的语音数据测试是否能跑通,如果可以,那就是自己数据集的问题了。

lisc199 commented 1 year ago

我在ttsfinetune的时候也遇到这个问题,看了一下代码发现mfa的结果是保存在./mfa_result下面的,但是读取的时候还有一个发音人id,所以要在mfa_result下面新建一个文件夹,把mfa的结果都扔进去

我安装了它,但它仍然显示相同的错误。

Tony-xubiao commented 1 year ago

我在ttsfinetune的时候也遇到这个问题,看了一下代码发现mfa的结果是保存在./mfa_result下面的,但是读取的时候还有一个发音人id,所以要在mfa_result下面新建一个文件夹,把mfa的结果都扔进去

我安装了它,但它仍然显示相同的错误。

你好,我也遇到了同样的问题,请问新建的文件夹名字是什么,发音人id还是数据集的名字?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.