Closed waltcow closed 1 year ago
auto label 过程没出现问题
2023-07-20 02:11:33,456 - modelscope - INFO - Use user-specified model revision: v1.0.5 --- New folder /content/output_training_data/paragraph/prosody... --- --- OK --- --- New folder /content/output_training_data/sp_interval... --- --- OK --- --- New folder /content/output_training_data/wav... --- --- OK --- --- Remove /content/output_training_data/log folder! --- --- New folder /content/output_training_data/log... --- --- OK --- 2023-07-20 02:12:03 wav_preprocess start... --- new folder... --- --- OK --- 100%|██████████| 1/1 [00:00<00:00, 23.94it/s]wav cut by vad start... 100%|██████████| 1/1 [00:00<00:00, 2.59it/s] 100%|██████████| 1/1 [00:00<00:00, 2.00it/s] Text to label start... 100%|██████████| 1/1 [00:00<00:00, 1.91it/s] pre-break recording in paragraph by vad. Generate phone interval by asr align. --- New folder /content/output_training_data/align... --- --- OK --- prosody_dir=/content/output_training_data/paragraph/prosody run_asr_align step 2 speak_script=/content/output_training_data/align/script.txt job_num=1 process_num=4 fbank_config=/root/.cache/modelscope/hub/damo/speech_ptts_autolabel_16k/model/fsmn_16k_2/fbank.conf, data_dir=/content/output_training_data/align/gen/data, fbank_dir=/content/output_training_data/align/gen/fbank run make_fbank with num=1 config_path=/root/.cache/modelscope/hub/damo/speech_ptts_autolabel_16k/model/fsmn_16k_2/fbank.conf data_path=/content/output_training_data/align/gen/data fbank_path=/content/output_training_data/align/gen/fbank [{'id': 'test_0_0', 'wav': '/content/output_training_data/wav_cut/test_0_0.wav'}] 100%|██████████| 1/1 [00:01<00:00, 1.90s/it]DONE compute fbank and copy feats DONE! job_num=1 process_num=4 data_dir=/content/output_training_data/align/gen/data lm_dir=/root/.cache/modelscope/hub/damo/speech_ptts_autolabel_16k/model/lang am_dir=/root/.cache/modelscope/hub/damo/speech_ptts_autolabel_16k/model/fsmn_16k_2, fbank_dir=/content/output_training_data/align/gen/fbank, align_dir=/content/output_training_data/align/gen/align [{'id': 'test_0_0', 'ark': '/content/output_training_data/align/gen/fbank/raw_fbank_data.test_0_0.ark', 'scp': '/content/output_training_data/align/gen/fbank/raw_fbank_data.test_0_0.scp'}] Feature preprocessing start... 100%|██████████| 1/1 [00:05<00:00, 5.20s/it]Waveform aligning start... 100%|██████████| 1/1 [00:01<00:00, 1.62s/it]do_align done! --- new folder... --- --- OK --- test_0_0.ali Trim silence wav with align info and modify wav files.... 100%|██████████| 1/1 [00:00<00:00, 80.69it/s]Convert align info to interval files.... --- There is this folder! --- test_0_0.ali Modify sil to sp in interval.... modify interval er phone. --- Remove /content/output_training_data/interval folder! --- --- New folder /content/output_training_data/interval... --- --- OK --- qualification review. prosody sillence detect. --- Remove /content/output_training_data/prosody folder! --- --- New folder /content/output_training_data/prosody... --- --- OK --- average silence duration: 0.3249999999999996 100%|██████████| 2/2 [00:00<00:00, 3506.94it/s]Write prosody file 0 "mismatch" sentences Auto labeling info: stage 1 | develop mode 0 | gender:female | score 10.000000 | retcode 0 labeling report: stage 1 | develop mode 0 | gender female | score 10.000000 | retcode 0 qulification report: credit score: 10.000000 qualified score: 3.000000 normalized snr: 35.000000 abandon utt snr threshold: 10.000000 snr score ration: 0.500000 interval score ration: 0.500000 data qulificaion report:
Training 时出错了
2023-07-20 02:13:16,273 - modelscope - INFO - Use user-specified model revision: v1.0.6 2023-07-20 02:13:17,519 - modelscope - INFO - Use user-specified model revision: v1.0.6 2023-07-20 02:13:18,124 - modelscope - INFO - Set workdir to ./pretrain_work_dir/ 2023-07-20 02:13:18,171 - modelscope - INFO - load ./output_training_data/ 2023-07-20 02:13:18,561 - modelscope - INFO - Use user-specified model revision: v1.0.6 2023-07-20 02:13:37,195 - modelscope - INFO - am_config=./pretrain_work_dir/orig_model/basemodel_16k/sambert/config.yaml voc_config=./pretrain_work_dir/orig_model/basemodel_16k/hifigan/config.yaml 2023-07-20 02:13:37,197 - modelscope - INFO - audio_config=./pretrain_work_dir/orig_model/basemodel_16k/audio_config_se_16k.yaml 2023-07-20 02:13:37,198 - modelscope - INFO - am_ckpts=OrderedDict([(2400000, './pretrain_work_dir/orig_model/basemodel_16k/sambert/ckpt/checkpoint_2400000.pth')]) 2023-07-20 02:13:37,200 - modelscope - INFO - voc_ckpts=OrderedDict([(2400000, './pretrain_work_dir/orig_model/basemodel_16k/hifigan/ckpt/checkpoint_2400000.pth')]) 2023-07-20 02:13:37,203 - modelscope - INFO - se_path=./pretrain_work_dir/orig_model/se.npy se_model_path=./pretrain_work_dir/orig_model/basemodel_16k/speaker_embedding/se.onnx 2023-07-20 02:13:37,204 - modelscope - INFO - mvn_path=./pretrain_work_dir/orig_model/mvn.npy 100%|██████████| 2/2 [00:00<00:00, 2823.50it/s]TextScriptConvertor.process: Save script to: ./pretrain_work_dir/data/Script.xml TextScriptConvertor.process: Save metafile to: ./pretrain_work_dir/data/raw_metafile.txt [AudioProcessor] Initialize AudioProcessor. [AudioProcessor] config params: [AudioProcessor] wav_normalize: True [AudioProcessor] trim_silence: True [AudioProcessor] trim_silence_threshold_db: 60 [AudioProcessor] preemphasize: False [AudioProcessor] sampling_rate: 16000 [AudioProcessor] hop_length: 200 [AudioProcessor] win_length: 1000 [AudioProcessor] n_fft: 2048 [AudioProcessor] n_mels: 80 [AudioProcessor] fmin: 0.0 [AudioProcessor] fmax: 8000.0 [AudioProcessor] phone_level_feature: True [AudioProcessor] se_feature: True [AudioProcessor] norm_type: mean_std [AudioProcessor] max_norm: 1.0 [AudioProcessor] symmetric: False [AudioProcessor] min_level_db: -100.0 [AudioProcessor] ref_level_db: 20 [AudioProcessor] num_workers: 16 [AudioProcessor] Amplitude normalization started Volume statistic proceeding... 100%|██████████| 1/1 [00:00<00:00, 1.70it/s] Average amplitude RMS : 0.126146 Volume statistic done. Volume normalization proceeding... 100%|██████████| 1/1 [00:00<00:00, 530.12it/s]Volume normalization done. [AudioProcessor] Amplitude normalization finished [AudioProcessor] Duration generation started 0%| | 0/1 [00:00<?, ?it/s][AudioProcessor] Duration align with mel is proceeding... 100%|██████████| 1/1 [00:01<00:00, 1.14s/it] [AudioProcessor] Duration generate finished [AudioProcessor] Trim silence with interval started [AudioProcessor] Start to load pcm from ./pretrain_work_dir/data/wav 100%|██████████| 1/1 [00:01<00:00, 1.08s/it] 0%| | 0/1 [00:01<?, ?it/s] 100%|██████████| 1/1 [00:00<00:00, 815.70it/s][AudioProcessor] Trim silence finished [AudioProcessor] Melspec extraction started 100%|██████████| 1/1 [00:01<00:00, 1.57s/it] [AudioProcessor] Melspec extraction finished Melspec statistic proceeding... 100%|██████████| 1/1 [00:00<00:00, 3236.35it/s] 100%|██████████| 1/1 [00:00<00:00, 363.39it/s]Melspec statistic done [AudioProcessor] melspec mean and std saved to: ./pretrain_work_dir/data/mel/mel_mean.txt, ./pretrain_work_dir/data/mel/mel_std.txt [AudioProcessor] Melspec mean std norm is proceeding... [AudioProcessor] Melspec normalization finished [AudioProcessor] Normed Melspec saved to ./pretrain_work_dir/data/mel [AudioProcessor] Pitch extraction started 0%| | 0/1 [00:00<?, ?it/s][AudioProcessor] Pitch align with mel is proceeding... 100%|██████████| 1/1 [00:01<00:00, 1.69s/it] [AudioProcessor] Pitch normalization is proceeding... 100%|██████████| 1/1 [00:00<00:00, 4128.25it/s] 100%|██████████| 1/1 [00:00<00:00, 3721.65it/s][AudioProcessor] f0 mean and std saved to: ./pretrain_work_dir/data/f0/f0_mean.txt, ./pretrain_work_dir/data/f0/f0_std.txt [AudioProcessor] Pitch mean std norm is proceeding... [AudioProcessor] Pitch turn to phone-level is proceeding... 100%|██████████| 1/1 [00:01<00:00, 1.55s/it] [AudioProcessor] Pitch normalization finished [AudioProcessor] Normed f0 saved to ./pretrain_work_dir/data/f0 [AudioProcessor] Pitch extraction finished [AudioProcessor] Energy extraction started 100%|██████████| 1/1 [00:01<00:00, 1.12s/it] 100%|██████████| 1/1 [00:00<00:00, 252.64it/s] 100%|██████████| 1/1 [00:00<00:00, 3682.44it/s][AudioProcessor] energy mean and std saved to: ./pretrain_work_dir/data/energy/energy_mean.txt, ./pretrain_work_dir/data/energy/energy_std.txt [AudioProcessor] Energy mean std norm is proceeding... 100%|██████████| 1/1 [00:01<00:00, 1.08s/it] [AudioProcessor] Energy normalization finished [AudioProcessor] Normed Energy saved to ./pretrain_work_dir/data/energy [AudioProcessor] Energy extraction finished [AudioProcessor] All features extracted successfully! Processing audio done. [SpeakerEmbeddingProcessor] Speaker embedding extractor started [SpeakerEmbeddingProcessor] se model loading error!!! [SpeakerEmbeddingProcessor] please update your se model to ensure that the version is greater than or equal to 1.0.5 [SpeakerEmbeddingProcessor] try load it as se.model [SpeakerEmbeddingProcessor] Speaker embedding extracted successfully! Processing speaker embedding done. Processing done. Voc metafile generated. AM metafile generated. 2023-07-20 02:14:06,035 - modelscope - INFO - Start training.... 2023-07-20 02:14:06,040 - modelscope - INFO - Start SAMBERT training... 2023-07-20 02:14:06,042 - modelscope - INFO - TRAIN SAMBERT.... 2023-07-20 02:14:06,059 - modelscope - INFO - TRAINING steps: 2400202 2023-07-20 02:14:06,069 - modelscope - INFO - audio_config = {'fmax': 8000.0, 'fmin': 0.0, 'hop_length': 200, 'max_norm': 1.0, 'min_level_db': -100.0, 'n_fft': 2048, 'n_mels': 80, 'norm_type': 'mean_std', 'num_workers': 16, 'phone_level_feature': True, 'preemphasize': False, 'ref_level_db': 20, 'sampling_rate': 16000, 'symmetric': False, 'trim_silence': True, 'trim_silence_threshold_db': 60, 'wav_normalize': True, 'win_length': 1000} 2023-07-20 02:14:06,070 - modelscope - INFO - Loss = {'MelReconLoss': {'enable': True, 'params': {'loss_type': 'mae'}}, 'ProsodyReconLoss': {'enable': True, 'params': {'loss_type': 'mae'}}} 2023-07-20 02:14:06,072 - modelscope - INFO - Model = {'KanTtsSAMBERT': {'optimizer': {'params': {'betas': [0.9, 0.98], 'eps': 1e-09, 'lr': 0.001, 'weight_decay': 0.0}, 'type': 'Adam'}, 'params': {'MAS': False, 'NSF': True, 'SE': True, 'decoder_attention_dropout': 0.1, 'decoder_dropout': 0.1, 'decoder_ffn_inner_dim': 1024, 'decoder_num_heads': 8, 'decoder_num_layers': 12, 'decoder_num_units': 128, 'decoder_prenet_units': [256, 256], 'decoder_relu_dropout': 0.1, 'dur_pred_lstm_units': 128, 'dur_pred_prenet_units': [128, 128], 'embedding_dim': 512, 'emotion_units': 32, 'encoder_attention_dropout': 0.1, 'encoder_dropout': 0.1, 'encoder_ffn_inner_dim': 1024, 'encoder_num_heads': 8, 'encoder_num_layers': 8, 'encoder_num_units': 128, 'encoder_projection_units': 32, 'encoder_relu_dropout': 0.1, 'max_len': 800, 'nsf_f0_global_maximum': 730.0, 'nsf_f0_global_minimum': 30.0, 'nsf_norm_type': 'global', 'num_mels': 82, 'outputs_per_step': 3, 'postnet_dropout': 0.1, 'postnet_ffn_inner_dim': 512, 'postnet_filter_size': 41, 'postnet_fsmn_num_layers': 4, 'postnet_lstm_units': 128, 'postnet_num_memory_units': 256, 'postnet_shift': 17, 'predictor_dropout': 0.1, 'predictor_ffn_inner_dim': 256, 'predictor_filter_size': 41, 'predictor_fsmn_num_layers': 3, 'predictor_lstm_units': 128, 'predictor_num_memory_units': 128, 'predictor_shift': 0, 'speaker_units': 192}, 'scheduler': {'params': {'warmup_steps': 4000}, 'type': 'NoamLR'}}} 2023-07-20 02:14:06,074 - modelscope - INFO - allow_cache = False 2023-07-20 02:14:06,084 - modelscope - INFO - batch_size = 32 2023-07-20 02:14:06,085 - modelscope - INFO - create_time = 2023-07-20 02:14:06 2023-07-20 02:14:06,087 - modelscope - INFO - eval_interval_steps = 10000000000000000 2023-07-20 02:14:06,090 - modelscope - INFO - git_revision_hash = d16755444c9baf23348213211a5ed9035458ecf0 2023-07-20 02:14:06,093 - modelscope - INFO - grad_norm = 1.0 2023-07-20 02:14:06,096 - modelscope - INFO - linguistic_unit = {'cleaners': 'english_cleaners', 'lfeat_type_list': 'sy,tone,syllable_flag,word_segment,emo_category,speaker_category', 'speaker_list': 'F7'} 2023-07-20 02:14:06,098 - modelscope - INFO - log_interval_steps = 50 2023-07-20 02:14:06,099 - modelscope - INFO - model_type = sambert 2023-07-20 02:14:06,100 - modelscope - INFO - num_save_intermediate_results = 4 2023-07-20 02:14:06,101 - modelscope - INFO - num_workers = 4 2023-07-20 02:14:06,102 - modelscope - INFO - pin_memory = False 2023-07-20 02:14:06,105 - modelscope - INFO - remove_short_samples = False 2023-07-20 02:14:06,111 - modelscope - INFO - save_interval_steps = 200 2023-07-20 02:14:06,113 - modelscope - INFO - train_max_steps = 2400202 2023-07-20 02:14:06,115 - modelscope - INFO - train_steps = 202 2023-07-20 02:14:06,119 - modelscope - INFO - log_interval = 10 2023-07-20 02:14:06,121 - modelscope - INFO - modelscope_version = 1.7.1 Loading metafile... 0it [00:00, ?it/s]Loading metafile... 100%|██████████| 1/1 [00:00<00:00, 9198.04it/s] 2023-07-20 02:14:06,139 - modelscope - INFO - The number of training files = 0. 2023-07-20 02:14:06,141 - modelscope - INFO - The number of validation files = 1. --------------------------------------------------------------------------- ValueError Traceback (most recent call last) [<ipython-input-15-0089498a7012>](https://localhost:8080/#) in <cell line: 33>() 31 default_args=kwargs) 32 ---> 33 trainer.train()
还没有遇到过这种情况,可以用最新的Colab笔记本再跑一下,昨天更新过一次
好的我再试试
还是遇到相同的问题 @KevinWang676
那建议用阿里云笔记本在阿里云的Notebook环境跑一下,是一样的
我知道问题了,原来是样本的时长太短了,无法跑起来