babysor / MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
34.88k stars 5.18k forks source link

训练声码器GPU内存报错 #869

Closed hujb2000 closed 1 year ago

hujb2000 commented 1 year ago

详细命令如下:C:\ProgramData\Anaconda3\envs\mockingbird\python.exe E:\workspace\MockingBird\control\cli\vocoder_preprocess.py e:\datasets -m E:\workspace\MockingBird\data\ckpt\synthesizer\mandarin 详细日志如下:Arguments: datasets_root: e:\datasets model_dir: E:\workspace\MockingBird\data\ckpt\synthesizer\mandarin hparams:
no_trim: False cpu: False

{'sample_rate': 16000, 'n_fft': 1024, 'num_mels': 80, 'hop_size': 256, 'win_size': 1024, 'fmin': 55, 'min_level_db': -100, 'ref_level_db': 20, 'max_abs_value': 4.0, 'preemphasis': 0.97, 'preemphasize': True, 'tts_embed_dims': 512, 'tts_encoder_dims': 256, 'tts_decoder_dims': 128, 'tts_postnet_dims': 512, 'tts_encoder_K': 5, 'tts_lstm_dims': 1024, 'tts_postnet_K': 5, 'tts_num_highways': 4, 'tts_dropout': 0.5, 'tts_cleaner_names': ['basic_cleaners'], 'tts_stop_threshold': -3.4, 'tts_schedule': [(2, 0.001, 10000, 12), (2, 0.0005, 15000, 12), (2, 0.0002, 20000, 12), (2, 0.0001, 30000, 12), (2, 5e-05, 40000, 12), (2, 1e-05, 60000, 12), (2, 5e-06, 160000, 12), (2, 3e-06, 320000, 12), (2, 1e-06, 640000, 12)], 'tts_clip_grad_norm': 1.0, 'tts_eval_interval': 500, 'tts_eval_num_samples': 1, 'tts_finetune_layers': [], 'max_mel_frames': 900, 'rescale': True, 'rescaling_max': 0.9, 'synthesis_batch_size': 16, 'signal_normalization': True, 'power': 1.5, 'griffin_lim_iters': 60, 'fmax': 7600, 'allow_clipping_in_normalization': True, 'clip_mels_length': True, 'use_lws': False, 'symmetric_mels': True, 'trim_silence': False, 'speaker_embedding_size': 256, 'silence_min_duration_split': 0.4, 'utterance_min_duration': 0.5, 'use_gst': True, 'use_ser_for_gst': True} Synthesizer using device: cuda Trainable Parameters: 0.000M

Loading weights at E:\workspace\MockingBird\data\ckpt\synthesizer\mandarin\mandarin.pt Tacotron weights loaded from step 88000 Using inputs from: e:\datasets\SV2TTS\synthesizer\train.txt e:\datasets\SV2TTS\synthesizer\mels e:\datasets\SV2TTS\synthesizer\embeds Found 164904 samples 0%| | 1/10307 [00:10<31:17:15, 10.93s/it] Traceback (most recent call last): File "E:\workspace\MockingBird\control\cli\vocoder_preprocess.py", line 58, in run_synthesis(args.in_dir, args.out_dir, args.model_dir, modified_hp) File "E:\workspace\MockingBird\models\synthesizer\synthesize.py", line 82, in runsynthesis , melsout, , _ = model(texts, mels, embeds) File "C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "E:\workspace\MockingBird\models\synthesizer\models\tacotron.py", line 281, in forward postnet_out = self.postnet(mel_outputs) File "C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "E:\workspace\MockingBird\models\synthesizer\models\sublayer\cbhg.py", line 62, in forward x = self.maxpool(conv_bank)[:, :, :seq_len] File "C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\pooling.py", line 92, in forward return F.max_pool1d(input, self.kernel_size, self.stride, File "C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\torch_jit_internal.py", line 484, in fn return if_false(*args, **kwargs) File "C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\torch\nn\functional.py", line 696, in _max_pool1d return torch.max_pool1d(input, kernel_size, stride, padding, dilation, ceil_mode) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 626.00 MiB (GPU 0; 12.00 GiB total capacity; 10.76 GiB already allocated; 0 bytes free; 10.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 小白不知道如何解决?

hujb2000 commented 1 year ago

看了作者文档,解决。

声码器-预处理数据集时:将 synthesizer/hparams.py中的batch_size参数调小

Data Preprocessing

    max_mel_frames = 900,
    rescale = True,
    rescaling_max = 0.9,
    synthesis_batch_size = 8,                  # For vocoder preprocessing and inference.

synthesis_batch_size =16 改成8