PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
11.01k stars 1.83k forks source link

[TTS]当训练example/csmsc/vits时报错,目前issues中好像还没看到有这个报错 #3437

Open Vebrun opened 1 year ago

Vebrun commented 1 year ago

[2023-07-31 09:23:40] [INFO] [trainer.py:167] iter: 456/350000, Rank: 0, real_loss: 1.465525, fake_loss: 0.950992, discriminator_loss: 2.416517, generator_loss: 48.644768, generator_mel_loss: 37.432968, generator_kl_loss: 2.080599, generator_dur_loss: 2.605819, generator_adv_loss: 2.705353, generator_feat_match_loss: 3.820032, avg_reader_cost: 0.00022 sec, avg_batch_cost: 1.10522 sec, avg_samples: 8, avg_ips: 7.23835 sequences/sec [2023-07-31 09:23:40] [INFO] [trainer.py:167] iter: 460/350000, Rank: 1, real_loss: 1.823982, fake_loss: 0.803789, discriminator_loss: 2.627771, generator_loss: 55.005493, generator_mel_loss: 41.993717, generator_kl_loss: 2.693164, generator_dur_loss: 2.472222, generator_adv_loss: 3.187515, generator_feat_match_loss: 4.658873, avg_reader_cost: 0.00018 sec, avg_batch_cost: 1.03553 sec, avg_samples: 8, avg_ips: 7.72554 sequences/sec Exception in main training loop: (InvalidArgument)

When step > 0, end should be greater than start, but received end = 31, start = 33.

[Hint: Expected end >= start, but received end:31 < start:33.] (at /paddle/paddle/phi/kernels/funcs/slice_utils.h:74) [operator < slice > error] Traceback (most recent call last): File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 149, in run update() File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/training/updaters/standard_updater.py", line 110, in update self.update_core(batch) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/vits/vits_updater.py", line 109, in update_core outs = self.model( File "/home/user/anaconda3/envs/PadSpe/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, kwargs) File "/home/user/anaconda3/envs/PadSpe/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/vits/vits.py", line 262, in forward return self._forward_discrminator( File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/vits/vits.py", line 358, in _forward_discrminator outs = self.generator( File "/home/user/anaconda3/envs/PadSpe/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(inputs, kwargs) File "/home/user/anaconda3/envs/PadSpe/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/vits/generator.py", line 416, in forward z_segments, z_start_idxs = get_random_segments( File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/modules/nets_utils.py", line 314, in get_random_segments segments = get_segments(x, start_idxs, segment_size) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/modules/nets_utils.py", line 337, in get_segments segments[i] = x[i, :, start_idx:start_idx + segment_size] File "/home/user/anaconda3/envs/PadSpe/lib/python3.8/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 736, in getitem return _getitemimpl(self, item) File "/home/user/anaconda3/envs/PadSpe/lib/python3.8/site-packages/paddle/fluid/variable_index.py", line 486, in _getitemimpl target_block.append_op( File "/home/user/anaconda3/envs/PadSpe/lib/python3.8/site-packages/paddle/fluid/framework.py", line 3599, in append_op _dygraph_tracer().trace_op(type, File "/home/user/anaconda3/envs/PadSpe/lib/python3.8/site-packages/paddle/fluid/dygraph/tracer.py", line 307, in trace_op self.trace(type, inputs, outputs, attrs, [2023-07-31 09:23:41] [INFO] [trainer.py:167] iter: 457/350000, Rank: 0, real_loss: 1.316723, fake_loss: 1.039383, discriminator_loss: 2.356106, generator_loss: 50.838497, generator_mel_loss: 40.326233, generator_kl_loss: 1.498949, generator_dur_loss: 2.676625, generator_adv_loss: 2.627703, generator_feat_match_loss: 3.708988, avg_reader_cost: 0.00022 sec, avg_batch_cost: 1.10919 sec, avg_samples: 8, avg_ips: 7.21250 sequences/sec Trainer extensions will try to handle the extension. Then all extensions will finalize.[2023-07-31 09:23:42] [INFO] [trainer.py:167] iter: 458/350000, Rank: 0, real_loss: 1.545704, fake_loss: 0.938184, discriminator_loss: 2.483888, generator_loss: 50.400703, generator_mel_loss: 39.265820, generator_kl_loss: 2.878182, generator_dur_loss: 2.736290, generator_adv_loss: 2.225367, generator_feat_match_loss: 3.295045, avg_reader_cost: 0.00027 sec, avg_batch_cost: 1.11008 sec, avg_samples: 8, avg_ips: 7.20667 sequences/sec [2023-07-31 09:23:43] [INFO] [trainer.py:167] iter: 459/350000, Rank: 0, real_loss: 1.157993, fake_loss: 1.243261, discriminator_loss: 2.401254, generator_loss: 48.529289, generator_mel_loss: 38.890598, generator_kl_loss: 1.199729, generator_dur_loss: 2.618741, generator_adv_loss: 2.403082, generator_feat_match_loss: 3.417137, avg_reader_cost: 0.00023 sec, avg_batch_cost: 1.11112 sec, avg_samples: 8, avg_ips: 7.19996 sequences/sec

报错为:When step > 0, end should be greater than start, but received end = 31, start = 33. 报错结束后依然打印了3次迭代,后面就不动了

zxcd commented 1 year ago

单卡运行也会报错吗?

Vebrun commented 1 year ago

单卡运行也会报错吗?

是的,单卡也报错,而且csmsc/jets也一样报这个错 ,tts0、tts3训练没问题 以下为jets报错

========Args======== config: conf/default.yaml dev_metadata: YG20221103_dump/dev/norm/metadata.jsonl ngpu: 1 output_dir: exp/YG20221103 phones_dict: jets_csmsc_ckpt_1.5.0/phone_id_map.txt speaker_dict: null train_metadata: YG20221103_dump/train/norm/metadata.jsonl voice_cloning: false

========Config======== batch_size: 12 cache_generator_outputs: True discriminator_adv_loss_params: average_by_discriminators: False loss_type: mse discriminator_optimizer_params: beta1: 0.8 beta2: 0.99 epsilon: 1e-09 weight_decay: 0.0 discriminator_scheduler: exponential_decay discriminator_scheduler_params: gamma: 0.999875 learning_rate: 0.0002 energy_extract: energy energy_extract_conf: reduction_factor: 1 use_token_averaged_energy: False energy_normalize: global_mvn eval_interval_steps: 250 f0max: 400 f0min: 80 feat_match_loss_params: average_by_discriminators: False average_by_layers: False include_final_outputs: True fmax: None fmin: 0 fs: 22050 generator_adv_loss_params: average_by_discriminators: False loss_type: mse generator_first: True generator_optimizer_params: beta1: 0.8 beta2: 0.99 epsilon: 1e-09 weight_decay: 0.0 generator_scheduler: exponential_decay generator_scheduler_params: gamma: 0.999875 learning_rate: 0.0002 lambda_adv: 1.0 lambda_align: 2.0 lambda_feat_match: 2.0 lambda_mel: 45.0 lambda_var: 1.0 mel_loss_params: fft_size: 1024 fmax: None fmin: 0 fs: 22050 hop_size: 256 log_base: None num_mels: 80 win_length: None window: hann model: cache_generator_outputs: True discriminator_params: follow_official_norm: False period_discriminator_params: bias: True channels: 32 downsample_scales: [3, 3, 3, 3, 1] in_channels: 1 kernel_sizes: [5, 3] max_downsample_channels: 1024 nonlinear_activation: leakyrelu nonlinear_activation_params: negative_slope: 0.1 out_channels: 1 use_spectral_norm: False use_weight_norm: True periods: [2, 3, 5, 7, 11] scale_discriminator_params: bias: True channels: 128 downsample_scales: [2, 2, 4, 4, 1] in_channels: 1 kernel_sizes: [15, 41, 5, 3] max_downsample_channels: 1024 max_groups: 16 nonlinear_activation: leakyrelu nonlinear_activation_params: negative_slope: 0.1 out_channels: 1 use_spectral_norm: False use_weight_norm: True scale_downsample_pooling: AvgPool1D scale_downsample_pooling_params: kernel_size: 4 padding: 2 stride: 2 scales: 1 discriminator_type: hifigan_multi_scale_multi_period_discriminator generator_params: adim: 256 aheads: 2 conformer_activation_type: swish conformer_dec_kernel_size: 31 conformer_enc_kernel_size: 7 conformer_pos_enc_layer_type: rel_pos conformer_rel_pos_type: latest conformer_self_attn_layer_type: rel_selfattn decoder_normalize_before: True decoder_type: transformer dlayers: 4 dunits: 1024 duration_predictor_chans: 256 duration_predictor_kernel_size: 3 duration_predictor_layers: 2 elayers: 4 encoder_normalize_before: True encoder_type: transformer energy_embed_dropout: 0.0 energy_embed_kernel_size: 1 energy_predictor_chans: 256 energy_predictor_dropout: 0.5 energy_predictor_kernel_size: 3 energy_predictor_layers: 2 eunits: 1024 generator_bias: True generator_channels: 512 generator_global_channels: -1 generator_kernel_size: 7 generator_nonlinear_activation: leakyrelu generator_nonlinear_activation_params: negative_slope: 0.1 generator_out_channels: 1 generator_resblock_dilations: [[1, 3, 5], [1, 3, 5], [1, 3, 5]] generator_resblock_kernel_sizes: [3, 7, 11] generator_upsample_kernel_sizes: [16, 16, 4, 4] generator_upsample_scales: [8, 8, 2, 2] generator_use_additional_convs: True generator_use_weight_norm: True init_dec_alpha: 1.0 init_enc_alpha: 1.0 init_type: xavier_uniform pitch_embed_dropout: 0.0 pitch_embed_kernel_size: 1 pitch_predictor_chans: 256 pitch_predictor_dropout: 0.5 pitch_predictor_kernel_size: 5 pitch_predictor_layers: 5 positionwise_conv_kernel_size: 3 positionwise_layer_type: conv1d segment_size: 64 stop_gradient_from_energy_predictor: False stop_gradient_from_pitch_predictor: True transformer_dec_attn_dropout_rate: 0.2 transformer_dec_dropout_rate: 0.2 transformer_dec_positional_dropout_rate: 0.2 transformer_enc_attn_dropout_rate: 0.2 transformer_enc_dropout_rate: 0.2 transformer_enc_positional_dropout_rate: 0.2 use_cnn_in_conformer: True use_macaron_style_in_conformer: True use_masking: True generator_type: jets_generator sampling_rate: 22050 n_fft: 1024 n_mels: 80 n_shift: 256 num_snapshots: 10 num_workers: 4 pitch_extract: dio pitch_extract_conf: reduction_factor: 1 use_token_averaged_f0: False pitch_normalize: global_mvn pre_ckpt: /home/user/xiewenbiao/PaddleSpeech/examples/csmsc/jets/jets_csmsc_ckpt_1.5.0/snapshot_iter_256000.pdz sampling_rate: 22050 save_interval_steps: 1000 seed: 777 train_max_steps: 350000 use_alignment_module: False win_length: None window: hann master see the word size: 1, from pid: 40793 rank: 0, pid: 40793, parent_pid: 40792 single speaker jets! spk_num: None samplers done! dataloaders done! vocab_size: 268 W0801 21:32:44.964900 40793 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 10.2 W0801 21:32:44.970010 40793 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6. model done! criterions done! optimizers done! Trainer Done! /home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.int64, but right dtype is paddle.float32, the right dtype will convert to paddle.int64 format(lhs_dtype, rhs_dtype, lhs_dtype)) /home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int64, the right dtype will convert to paddle.float32 format(lhs_dtype, rhs_dtype, lhs_dtype)) [2023-08-01 21:32:55] [INFO] [trainer.py:167] iter: 1/350000, Rank: 0, generator_loss: 98.560547, generator_generator_loss: 97.281311, generator_variance_loss: 1.279239, generator_generator_mel_loss: 89.507645, generator_generator_adv_loss: 2.250462, generator_generator_feat_match_loss: 5.523203, generator_variance_dur_loss: 0.089540, generator_variance_pitch_loss: 0.670074, generator_variance_energy_loss: 0.519626, real_loss: 1.151471, fake_loss: 1.110775, discriminator_loss: 2.262245, avg_reader_cost: 0.21947 sec, avg_batch_cost: 3.81660 sec, avg_samples: 12, avg_ips: 3.14416 sequences/sec [2023-08-01 21:32:56] [INFO] [trainer.py:167] iter: 2/350000, Rank: 0, generator_loss: 135.287186, generator_generator_loss: 134.008362, generator_variance_loss: 1.278825, generator_generator_mel_loss: 80.932655, generator_generator_adv_loss: 46.143200, generator_generator_feat_match_loss: 6.932509, generator_variance_dur_loss: 0.111385, generator_variance_pitch_loss: 0.693848, generator_variance_energy_loss: 0.473592, real_loss: 37.198997, fake_loss: 24.103233, discriminator_loss: 61.302231, avg_reader_cost: 0.00044 sec, avg_batch_cost: 1.05273 sec, avg_samples: 12, avg_ips: 11.39896 sequences/sec [2023-08-01 21:32:57] [INFO] [trainer.py:167] iter: 3/350000, Rank: 0, generator_loss: 92.084953, generator_generator_loss: 90.929031, generator_variance_loss: 1.155924, generator_generator_mel_loss: 80.727135, generator_generator_adv_loss: 3.290438, generator_generator_feat_match_loss: 6.911459, generator_variance_dur_loss: 0.109164, generator_variance_pitch_loss: 0.550319, generator_variance_energy_loss: 0.496441, real_loss: 5.251134, fake_loss: 4.988843, discriminator_loss: 10.239977, avg_reader_cost: 0.00046 sec, avg_batch_cost: 1.08345 sec, avg_samples: 12, avg_ips: 11.07577 sequences/sec Exception in main training loop: (InvalidArgument) When step > 0, end should be greater than start, but received end = 54, start = 407. [Hint: Expected end >= start, but received end:54 < start:407.] (at /paddle/paddle/phi/kernels/funcs/slice_utils.h:74) [operator < slice > error] Traceback (most recent call last): File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 149, in run update() File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/training/updaters/standard_updater.py", line 110, in update self.update_core(batch) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/jets/jets_updater.py", line 132, in update_core use_alignment_module=self.use_alignment_module) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, kwargs) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/jets/jets.py", line 328, in forward use_alignment_module=use_alignment_module, ) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/jets/jets.py", line 405, in _forward_generator use_alignment_module=use_alignment_module, ) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(inputs, kwargs) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, kwargs) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/jets/generator.py", line 703, in forward self.segment_size, ) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/modules/nets_utils.py", line 352, in get_random_segments segments = get_segments(x, start_idxs, segment_size) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/modules/nets_utils.py", line 375, in get_segments segments[i] = x[i, :, start_idx:start_idx + segment_size] File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 736, in getitem return _getitemimpl(self, item) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/variable_index.py", line 490, in _getitemimpl attrs=attrs) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3604, in append_op inplace_map) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 309, in trace_op not stop_gradient, inplace_map if inplace_map else {}) Trainer extensions will try to handle the extension. Then all extensions will finalize.Traceback (most recent call last): File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/exps/jets/train.py", line 311, in main() File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/exps/jets/train.py", line 307, in main train_sp(args, config) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/exps/jets/train.py", line 263, in train_sp trainer.run() File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 198, in run six.reraise(exc_info) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/six.py", line 719, in reraise raise value File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 149, in run update() File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/training/updaters/standard_updater.py", line 110, in update self.update_core(batch) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/jets/jets_updater.py", line 132, in update_core use_alignment_module=self.use_alignment_module) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(inputs, kwargs) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, kwargs) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/jets/jets.py", line 328, in forward use_alignment_module=use_alignment_module, ) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/jets/jets.py", line 405, in _forward_generator use_alignment_module=use_alignment_module, ) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, *kwargs) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(inputs, kwargs) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/models/jets/generator.py", line 703, in forward self.segment_size, ) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/modules/nets_utils.py", line 352, in get_random_segments segments = get_segments(x, start_idxs, segment_size) File "/home/user/xiewenbiao/PaddleSpeech/paddlespeech/t2s/modules/nets_utils.py", line 375, in get_segments segments[i] = x[i, :, start_idx:start_idx + segment_size] File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 736, in getitem return _getitemimpl(self, item) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/variable_index.py", line 490, in _getitemimpl attrs=attrs) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3604, in append_op inplace_map) File "/home/user/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 309, in trace_op not stop_gradient, inplace_map if inplace_map else {}) ValueError: (InvalidArgument) When step > 0, end should be greater than start, but received end = 54, start = 407. [Hint: Expected end >= start, but received end:54 < start:407.] (at /paddle/paddle/phi/kernels/funcs/slice_utils.h:74) [operator < slice > error]

zxcd commented 1 year ago

你可能需要查一下为什么会出现这个问题

ValueError: (InvalidArgument) When step > 0, end should be greater than start, but received end = 54, start = 407.
[Hint: Expected end >= start, but received end:54 < start:407.] (at /paddle/paddle/phi/kernels/funcs/slice_utils.h:74)