PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on.
Apache License 2.0
7.9k
stars
1.24k
forks
source link
Cannot use paddle speech for voice cloning. Got this : ValueError: (InvalidArgument) Deserialize to tensor failed, maybe the loaded file is not a paddle model(expected file format: 0, but 589505315 found). #793
- im running the code using :
`CUDA_VISIBLE_DEVICES=0 ./voice_cloning.sh /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/conf/default.yaml /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/pretrained fastspeech2_nosil_aishell3_vc1_ckpt_0.5 /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/local/ge2e_ckpt_0.3/step-3000000.pdparams /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/saloni`
- This is the output im getting
/bin/bash: /home/newzera/anaconda3/envs/paddlespeech/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: np.complex is a deprecated alias for the builtin complex. To silence this warning, use complex by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.complex128 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype=np.complex,
/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
========Args========
am: fastspeech2_aishell3
am_ckpt: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/pretrained/checkpoints/fastspeech2_nosil_aishell3_vc1_ckpt_0.5
am_config: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/conf/default.yaml
am_stat: dump/train/speech_stats.npy
ge2e_params_path: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/local/ge2e_ckpt_0.3/step-3000000.pdparams
input_dir: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/saloni
ngpu: 1
output_dir: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/pretrained/vc_syn
phones_dict: dump/phone_id_map.txt
text: Hello my name is saloni
use_ecapa: false
voc: pwgan_aishell3
voc_ckpt: pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz
voc_config: pwg_aishell3_ckpt_0.5/default.yaml
voc_stat: pwg_aishell3_ckpt_0.5/feats_stats.npy
========Config========
batch_size: 64
f0max: 400
f0min: 80
fmax: 7600
fmin: 80
fs: 24000
max_epoch: 200
model:
adim: 384
aheads: 2
decoder_normalize_before: True
dlayers: 4
dunits: 1536
duration_predictor_chans: 256
duration_predictor_kernel_size: 3
duration_predictor_layers: 2
elayers: 4
encoder_normalize_before: True
energy_embed_dropout: 0.0
energy_embed_kernel_size: 1
energy_predictor_chans: 256
energy_predictor_dropout: 0.5
energy_predictor_kernel_size: 3
energy_predictor_layers: 2
eunits: 1536
init_dec_alpha: 1.0
init_enc_alpha: 1.0
init_type: xavier_uniform
pitch_embed_dropout: 0.0
pitch_embed_kernel_size: 1
pitch_predictor_chans: 256
pitch_predictor_dropout: 0.5
pitch_predictor_kernel_size: 5
pitch_predictor_layers: 5
positionwise_conv_kernel_size: 3
positionwise_layer_type: conv1d
postnet_chans: 256
postnet_filts: 5
postnet_layers: 5
reduction_factor: 1
spk_embed_dim: 256
spk_embed_integration_type: concat
stop_gradient_from_energy_predictor: False
stop_gradient_from_pitch_predictor: True
transformer_dec_attn_dropout_rate: 0.2
transformer_dec_dropout_rate: 0.2
transformer_dec_positional_dropout_rate: 0.2
transformer_enc_attn_dropout_rate: 0.2
transformer_enc_dropout_rate: 0.2
transformer_enc_positional_dropout_rate: 0.2
use_scaled_pos_enc: True
n_fft: 2048
n_mels: 80
n_shift: 300
num_snapshots: 5
num_workers: 2
optimizer:
learning_rate: 0.001
optim: adam
seed: 10086
updater:
use_masking: True
win_length: 1200
window: hann
allow_cache: True
batch_max_steps: 24000
batch_size: 8
discriminator_grad_norm: 1
discriminator_optimizer_params:
epsilon: 1e-06
weight_decay: 0.0
discriminator_params:
bias: True
conv_channels: 64
in_channels: 1
kernel_size: 3
layers: 10
nonlinear_activation: LeakyReLU
nonlinear_activation_params:
negative_slope: 0.2
out_channels: 1
use_weight_norm: True
discriminator_scheduler_params:
gamma: 0.5
learning_rate: 5e-05
step_size: 200000
discriminator_train_start_steps: 100000
eval_interval_steps: 1000
fmax: 7600
fmin: 80
fs: 24000
generator_grad_norm: 10
generator_optimizer_params:
epsilon: 1e-06
weight_decay: 0.0
generator_params:
aux_channels: 80
aux_context_window: 2
dropout: 0.0
gate_channels: 128
in_channels: 1
kernel_size: 3
layers: 30
out_channels: 1
residual_channels: 64
skip_channels: 64
stacks: 3
upsample_scales: [4, 5, 3, 5]
use_weight_norm: True
generator_scheduler_params:
gamma: 0.5
learning_rate: 0.0001
step_size: 200000
lambda_adv: 4.0
n_fft: 2048
n_mels: 80
n_shift: 300
num_save_intermediate_results: 4
num_snapshots: 10
num_workers: 4
pin_memory: True
remove_short_samples: True
save_interval_steps: 5000
seed: 42
stft_loss_params:
fft_sizes: [1024, 2048, 512]
hop_sizes: [120, 240, 50]
win_lengths: [600, 1200, 240]
window: hann
train_max_steps: 1000000
win_length: 1200
window: hann
Audio Processor Done!
W0602 10:52:56.641338 144178 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.0, Runtime API Version: 10.2
W0602 10:52:56.642062 144178 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
GE2E Done!
[2023-06-02 10:53:00,516] [ INFO] - Already cached /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-06-02 10:53:00,524] [ INFO] - tokenizer config file saved in /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-06-02 10:53:00,524] [ INFO] - Special tokens file saved in /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
frontend done!
Building prefix dict from the default dictionary ...
[2023-06-02 10:53:00] [DEBUG] [init.py:113] Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2023-06-02 10:53:00] [DEBUG] [init.py:133] Loading model from cache /tmp/jieba.cache
Loading model cost 0.499 seconds.
[2023-06-02 10:53:01] [DEBUG] [init.py:165] Loading model cost 0.499 seconds.
Prefix dict has been built successfully.
[2023-06-02 10:53:01] [DEBUG] [init.py:166] Prefix dict has been built successfully.
Traceback (most recent call last):
File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 233, in
main()
File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 229, in main
voice_cloning(args)
File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 106, in voice_cloning
phones_dict=args.phones_dict)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddlespeech/t2s/exps/syn_utils.py", line 371, in get_am_inference
am.set_state_dict(paddle.load(am_ckpt)["main_params"])
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 1103, in load
load_result = _legacy_load(path, *configs)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 1150, in _legacy_load
load_result = _load_state_dict_from_save_params(model_path)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 147, in _load_state_dict_from_save_params
attrs={'file_path': os.path.join(model_path, name)},
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 314, in trace_op
stop_gradient, inplace_map)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 176, in eager_legacy_trace_op
returns = function_ptr(arg_list, *attrs_list)
ValueError: (InvalidArgument) Deserialize to tensor failed, maybe the loaded file is not a paddle model(expected file format: 0, but 589505315 found).
[Hint: Expected version == 0U, but received version:589505315 != 0U:0.] (at /paddle/paddle/phi/core/serialization.cc:106)
[operator < load > error]
Can anyone tell me what i did wrong? or how to resolve the error? Please be forgiving as im new to this.
I wanted to test the paddlespeech repo to clone a voice . My target text is english. (is that possible?) Here are the steps that ive taken.
/mnt/msd/users/arnav/
is my workspace) and installed dependenciesconfig_path=$1 train_output_path=$2 ckpt_name=$3 ge2e_params_path=$4 ref_audio_dir=$5
python3 /mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py \ --am=fastspeech2_aishell3 \ --am_config=${config_path} \ --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \ --am_stat=dump/train/speech_stats.npy \ --voc=pwgan_aishell3 \ --voc_config=pwg_aishell3_ckpt_0.5/default.yaml \ --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \ --voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy \ --ge2e_params_path=${ge2e_params_path} \ --text="Hello my name is saloni" \ --input-dir=${ref_audio_dir} \ --output-dir=${train_output_path}/vc_syn \ --phones-dict=dump/phone_id_map.txt
/bin/bash: /home/newzera/anaconda3/envs/paddlespeech/lib/libtinfo.so.6: no version information available (required by /bin/bash) /home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/librosa/core/constantq.py:1059: DeprecationWarning:
np.complex
is a deprecated alias for the builtincomplex
. To silence this warning, usecomplex
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.complex128
here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations dtype=np.complex, /home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") ========Args======== am: fastspeech2_aishell3 am_ckpt: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/pretrained/checkpoints/fastspeech2_nosil_aishell3_vc1_ckpt_0.5 am_config: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/conf/default.yaml am_stat: dump/train/speech_stats.npy ge2e_params_path: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/local/ge2e_ckpt_0.3/step-3000000.pdparams input_dir: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/saloni ngpu: 1 output_dir: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/pretrained/vc_syn phones_dict: dump/phone_id_map.txt text: Hello my name is saloni use_ecapa: false voc: pwgan_aishell3 voc_ckpt: pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz voc_config: pwg_aishell3_ckpt_0.5/default.yaml voc_stat: pwg_aishell3_ckpt_0.5/feats_stats.npy========Config======== batch_size: 64 f0max: 400 f0min: 80 fmax: 7600 fmin: 80 fs: 24000 max_epoch: 200 model: adim: 384 aheads: 2 decoder_normalize_before: True dlayers: 4 dunits: 1536 duration_predictor_chans: 256 duration_predictor_kernel_size: 3 duration_predictor_layers: 2 elayers: 4 encoder_normalize_before: True energy_embed_dropout: 0.0 energy_embed_kernel_size: 1 energy_predictor_chans: 256 energy_predictor_dropout: 0.5 energy_predictor_kernel_size: 3 energy_predictor_layers: 2 eunits: 1536 init_dec_alpha: 1.0 init_enc_alpha: 1.0 init_type: xavier_uniform pitch_embed_dropout: 0.0 pitch_embed_kernel_size: 1 pitch_predictor_chans: 256 pitch_predictor_dropout: 0.5 pitch_predictor_kernel_size: 5 pitch_predictor_layers: 5 positionwise_conv_kernel_size: 3 positionwise_layer_type: conv1d postnet_chans: 256 postnet_filts: 5 postnet_layers: 5 reduction_factor: 1 spk_embed_dim: 256 spk_embed_integration_type: concat stop_gradient_from_energy_predictor: False stop_gradient_from_pitch_predictor: True transformer_dec_attn_dropout_rate: 0.2 transformer_dec_dropout_rate: 0.2 transformer_dec_positional_dropout_rate: 0.2 transformer_enc_attn_dropout_rate: 0.2 transformer_enc_dropout_rate: 0.2 transformer_enc_positional_dropout_rate: 0.2 use_scaled_pos_enc: True n_fft: 2048 n_mels: 80 n_shift: 300 num_snapshots: 5 num_workers: 2 optimizer: learning_rate: 0.001 optim: adam seed: 10086 updater: use_masking: True win_length: 1200 window: hann allow_cache: True batch_max_steps: 24000 batch_size: 8 discriminator_grad_norm: 1 discriminator_optimizer_params: epsilon: 1e-06 weight_decay: 0.0 discriminator_params: bias: True conv_channels: 64 in_channels: 1 kernel_size: 3 layers: 10 nonlinear_activation: LeakyReLU nonlinear_activation_params: negative_slope: 0.2 out_channels: 1 use_weight_norm: True discriminator_scheduler_params: gamma: 0.5 learning_rate: 5e-05 step_size: 200000 discriminator_train_start_steps: 100000 eval_interval_steps: 1000 fmax: 7600 fmin: 80 fs: 24000 generator_grad_norm: 10 generator_optimizer_params: epsilon: 1e-06 weight_decay: 0.0 generator_params: aux_channels: 80 aux_context_window: 2 dropout: 0.0 gate_channels: 128 in_channels: 1 kernel_size: 3 layers: 30 out_channels: 1 residual_channels: 64 skip_channels: 64 stacks: 3 upsample_scales: [4, 5, 3, 5] use_weight_norm: True generator_scheduler_params: gamma: 0.5 learning_rate: 0.0001 step_size: 200000 lambda_adv: 4.0 n_fft: 2048 n_mels: 80 n_shift: 300 num_save_intermediate_results: 4 num_snapshots: 10 num_workers: 4 pin_memory: True remove_short_samples: True save_interval_steps: 5000 seed: 42 stft_loss_params: fft_sizes: [1024, 2048, 512] hop_sizes: [120, 240, 50] win_lengths: [600, 1200, 240] window: hann train_max_steps: 1000000 win_length: 1200 window: hann Audio Processor Done! W0602 10:52:56.641338 144178 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.0, Runtime API Version: 10.2 W0602 10:52:56.642062 144178 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8. GE2E Done! [2023-06-02 10:53:00,516] [ INFO] - Already cached /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt [2023-06-02 10:53:00,524] [ INFO] - tokenizer config file saved in /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/tokenizer_config.json [2023-06-02 10:53:00,524] [ INFO] - Special tokens file saved in /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/special_tokens_map.json frontend done! Building prefix dict from the default dictionary ... [2023-06-02 10:53:00] [DEBUG] [init.py:113] Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache [2023-06-02 10:53:00] [DEBUG] [init.py:133] Loading model from cache /tmp/jieba.cache Loading model cost 0.499 seconds. [2023-06-02 10:53:01] [DEBUG] [init.py:165] Loading model cost 0.499 seconds. Prefix dict has been built successfully. [2023-06-02 10:53:01] [DEBUG] [init.py:166] Prefix dict has been built successfully. Traceback (most recent call last): File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 233, in
main()
File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 229, in main
voice_cloning(args)
File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 106, in voice_cloning
phones_dict=args.phones_dict)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddlespeech/t2s/exps/syn_utils.py", line 371, in get_am_inference
am.set_state_dict(paddle.load(am_ckpt)["main_params"])
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 1103, in load
load_result = _legacy_load(path, *configs)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 1150, in _legacy_load
load_result = _load_state_dict_from_save_params(model_path)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 147, in _load_state_dict_from_save_params
attrs={'file_path': os.path.join(model_path, name)},
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 314, in trace_op
stop_gradient, inplace_map)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 176, in eager_legacy_trace_op
returns = function_ptr(arg_list, *attrs_list)
ValueError: (InvalidArgument) Deserialize to tensor failed, maybe the loaded file is not a paddle model(expected file format: 0, but 589505315 found).
[Hint: Expected version == 0U, but received version:589505315 != 0U:0.] (at /paddle/paddle/phi/core/serialization.cc:106)
[operator < load > error]