Open arnav-newzera opened 1 year ago
看下ckpt是否破损了。
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Have you solved this problem? I met the same question in keyword spotting.
Have you solved this problem? I met the same question in keyword spotting.
I solved it just now. You could set the both value of config and ckpt_path as none, if you have downloaded the model.
I wanted to test the paddlespeech repo to clone a voice . My target text is english. (is that possible?) Here are the steps that ive taken.
/mnt/msd/users/arnav/
is my workspace) and installed dependenciesconfig_path=$1 train_output_path=$2 ckpt_name=$3 ge2e_params_path=$4 ref_audio_dir=$5
python3 /mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py \ --am=fastspeech2_aishell3 \ --am_config=${config_path} \ --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \ --am_stat=dump/train/speech_stats.npy \ --voc=pwgan_aishell3 \ --voc_config=pwg_aishell3_ckpt_0.5/default.yaml \ --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \ --voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy \ --ge2e_params_path=${ge2e_params_path} \ --text="Hello my name is saloni" \ --input-dir=${ref_audio_dir} \ --output-dir=${train_output_path}/vc_syn \ --phones-dict=dump/phone_id_map.txt
/bin/bash: /home/newzera/anaconda3/envs/paddlespeech/lib/libtinfo.so.6: no version information available (required by /bin/bash) /home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/librosa/core/constantq.py:1059: DeprecationWarning:
np.complex
is a deprecated alias for the builtincomplex
. To silence this warning, usecomplex
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.complex128
here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations dtype=np.complex, /home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") ========Args======== am: fastspeech2_aishell3 am_ckpt: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/pretrained/checkpoints/fastspeech2_nosil_aishell3_vc1_ckpt_0.5 am_config: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/conf/default.yaml am_stat: dump/train/speech_stats.npy ge2e_params_path: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/local/ge2e_ckpt_0.3/step-3000000.pdparams input_dir: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/saloni ngpu: 1 output_dir: /mnt/msd/users/arnav/PaddleSpeech/examples/aishell3/vc1/pretrained/vc_syn phones_dict: dump/phone_id_map.txt text: Hello my name is saloni use_ecapa: false voc: pwgan_aishell3 voc_ckpt: pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz voc_config: pwg_aishell3_ckpt_0.5/default.yaml voc_stat: pwg_aishell3_ckpt_0.5/feats_stats.npy========Config======== batch_size: 64 f0max: 400 f0min: 80 fmax: 7600 fmin: 80 fs: 24000 max_epoch: 200 model: adim: 384 aheads: 2 decoder_normalize_before: True dlayers: 4 dunits: 1536 duration_predictor_chans: 256 duration_predictor_kernel_size: 3 duration_predictor_layers: 2 elayers: 4 encoder_normalize_before: True energy_embed_dropout: 0.0 energy_embed_kernel_size: 1 energy_predictor_chans: 256 energy_predictor_dropout: 0.5 energy_predictor_kernel_size: 3 energy_predictor_layers: 2 eunits: 1536 init_dec_alpha: 1.0 init_enc_alpha: 1.0 init_type: xavier_uniform pitch_embed_dropout: 0.0 pitch_embed_kernel_size: 1 pitch_predictor_chans: 256 pitch_predictor_dropout: 0.5 pitch_predictor_kernel_size: 5 pitch_predictor_layers: 5 positionwise_conv_kernel_size: 3 positionwise_layer_type: conv1d postnet_chans: 256 postnet_filts: 5 postnet_layers: 5 reduction_factor: 1 spk_embed_dim: 256 spk_embed_integration_type: concat stop_gradient_from_energy_predictor: False stop_gradient_from_pitch_predictor: True transformer_dec_attn_dropout_rate: 0.2 transformer_dec_dropout_rate: 0.2 transformer_dec_positional_dropout_rate: 0.2 transformer_enc_attn_dropout_rate: 0.2 transformer_enc_dropout_rate: 0.2 transformer_enc_positional_dropout_rate: 0.2 use_scaled_pos_enc: True n_fft: 2048 n_mels: 80 n_shift: 300 num_snapshots: 5 num_workers: 2 optimizer: learning_rate: 0.001 optim: adam seed: 10086 updater: use_masking: True win_length: 1200 window: hann allow_cache: True batch_max_steps: 24000 batch_size: 8 discriminator_grad_norm: 1 discriminator_optimizer_params: epsilon: 1e-06 weight_decay: 0.0 discriminator_params: bias: True conv_channels: 64 in_channels: 1 kernel_size: 3 layers: 10 nonlinear_activation: LeakyReLU nonlinear_activation_params: negative_slope: 0.2 out_channels: 1 use_weight_norm: True discriminator_scheduler_params: gamma: 0.5 learning_rate: 5e-05 step_size: 200000 discriminator_train_start_steps: 100000 eval_interval_steps: 1000 fmax: 7600 fmin: 80 fs: 24000 generator_grad_norm: 10 generator_optimizer_params: epsilon: 1e-06 weight_decay: 0.0 generator_params: aux_channels: 80 aux_context_window: 2 dropout: 0.0 gate_channels: 128 in_channels: 1 kernel_size: 3 layers: 30 out_channels: 1 residual_channels: 64 skip_channels: 64 stacks: 3 upsample_scales: [4, 5, 3, 5] use_weight_norm: True generator_scheduler_params: gamma: 0.5 learning_rate: 0.0001 step_size: 200000 lambda_adv: 4.0 n_fft: 2048 n_mels: 80 n_shift: 300 num_save_intermediate_results: 4 num_snapshots: 10 num_workers: 4 pin_memory: True remove_short_samples: True save_interval_steps: 5000 seed: 42 stft_loss_params: fft_sizes: [1024, 2048, 512] hop_sizes: [120, 240, 50] win_lengths: [600, 1200, 240] window: hann train_max_steps: 1000000 win_length: 1200 window: hann Audio Processor Done! W0602 10:52:56.641338 144178 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.0, Runtime API Version: 10.2 W0602 10:52:56.642062 144178 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8. GE2E Done! [2023-06-02 10:53:00,516] [ INFO] - Already cached /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt [2023-06-02 10:53:00,524] [ INFO] - tokenizer config file saved in /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/tokenizer_config.json [2023-06-02 10:53:00,524] [ INFO] - Special tokens file saved in /home/arnav-newzera/.paddlenlp/models/bert-base-chinese/special_tokens_map.json frontend done! Building prefix dict from the default dictionary ... [2023-06-02 10:53:00] [DEBUG] [init.py:113] Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache [2023-06-02 10:53:00] [DEBUG] [init.py:133] Loading model from cache /tmp/jieba.cache Loading model cost 0.499 seconds. [2023-06-02 10:53:01] [DEBUG] [init.py:165] Loading model cost 0.499 seconds. Prefix dict has been built successfully. [2023-06-02 10:53:01] [DEBUG] [init.py:166] Prefix dict has been built successfully. Traceback (most recent call last): File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 233, in
main()
File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 229, in main
voice_cloning(args)
File "/mnt/msd/users/arnav/PaddleSpeech/paddlespeech/t2s/exps/voice_cloning.py", line 106, in voice_cloning
phones_dict=args.phones_dict)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddlespeech/t2s/exps/syn_utils.py", line 371, in get_am_inference
am.set_state_dict(paddle.load(am_ckpt)["main_params"])
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 1103, in load
load_result = _legacy_load(path, *configs)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 1150, in _legacy_load
load_result = _load_state_dict_from_save_params(model_path)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/framework/io.py", line 147, in _load_state_dict_from_save_params
attrs={'file_path': os.path.join(model_path, name)},
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 314, in trace_op
stop_gradient, inplace_map)
File "/home/newzera/anaconda3/envs/paddlespeech/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 176, in eager_legacy_trace_op
returns = function_ptr(arg_list, *attrs_list)
ValueError: (InvalidArgument) Deserialize to tensor failed, maybe the loaded file is not a paddle model(expected file format: 0, but 589505315 found).
[Hint: Expected version == 0U, but received version:589505315 != 0U:0.] (at /paddle/paddle/phi/core/serialization.cc:106)
[operator < load > error]