AlexandaJerry / whisper-vits-japanese

Vits Japanese with Whisper as data processor (you can train your VITS even you only have audios)
MIT License
158 stars 28 forks source link

【需要帮助】WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. #9

Open KJZH001 opened 1 year ago

KJZH001 commented 1 year ago

我在您的代码注释中野当然看到了

#如果输出提示nophoneme这种情况,检查下/content/whisper-vits-japanese/filelists的两个txt文件里有没有出现英文转写

不过,我想知道更加具体一些的内容,请问英文转写是指什么呢?这里我不是很能够理解您的意思,还请见谅

这个路径下的四个文件都打开看过,文件名和您的视频中的是一样的

其中两个是包含了训练集内容的txt文件,大致看下来似乎都是日语,另外两个都是包含有箭头的注音标记

请问该如何判断是不是含有英文转写呢?以及如何处理这种情况

另外还有个问题我想问一下,在whisper的运行过程中,我看到有些句子里出现了日语掺杂英文单词的情况,请问这种情况是否会影响训练呢?

希望能够得到帮助,谢谢!

KJZH001 commented 1 year ago

image

这是完整的报错信息,由于远程桌面复制文本会丢失格式,不知道是什么情况,所以就直接截图了

如果无法看到图片,以下是原始文本内容

START: /content/whisper-vits-japanese/filelists/train_filelist.txt WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_push_word() in jpcommon_label.c: First mora should not be long vowel symbol. WARNING: JPCommonLabel_push_word() in jpcommon_label.c: First mora should not be long vowel symbol. WARNING: JPCommonLabel_push_word() in jpcommon_label.c: First mora should not be long vowel symbol. WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_insert_pause() in jpcommon_label.c: First mora should not be short pause. START: /content/whisper-vits-japanese/filelists/val_filelist.txt WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_push_word() in jpcommon_label.c: First mora should not be long vowel symbol. WARNING: JPCommonLabel_push_word() in jpcommon_label.c: First mora should not be long vowel symbol. WARNING: JPCommonLabel_push_word() in jpcommon_label.c: First mora should not be long vowel symbol. WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_make() in jcomon_label.c: No phoneme. WARNING: JPCommonLabel_insert_pause() in jpcommon_label.c: First mora should not be short pause.
AlexandaJerry commented 1 year ago

你好你好!这个是属于提示而不是报错,不会影响程序运行和后续训练。这种现象是因为whisper可以指定语种,但是无法锁定语种。如果日语语音中出现了英语/汉语的话,whisper仍然会识别出这些语种,这些单词因为不是日语单词,在预处理阶段没法被标注上语调/节律mora,所以会出现提示。

KJZH001 commented 1 year ago

你好你好!这个是属于提示而不是报错,不会影响程序运行和后续训练。这种现象是因为whisper可以指定语种,但是无法锁定语种。如果日语语音中出现了英语/汉语的话,whisper仍然会识别出这些语种,这些单词因为不是日语单词,在预处理阶段没法被标注上语调/节律mora,所以会出现提示。

感谢解答

也就是说我接着往下跑就行了对吧,另外我想问一下推断应该是在何时进行的呢?按照我的理解似乎是在完成上面一步训练了之后再进行推断,然后如果对模型后续再进行了训练的话推断还是要重新进行的,是这样吗?

最后补充一下,其实我还没有去试能不能接着往下炼,今天colab似乎不肯让我连GPU后端了(x

AlexandaJerry commented 1 year ago

是的 继续跑就行,不会影响后续的操作。推断一般是等到200/300epoch之后,暂停掉训练部分然后运行推断,推断完后回去点击训练部分还可以继续训练。着急的话100epoch后就可以停下训练,推断体验一下

KJZH001 commented 1 year ago

是的 继续跑就行,不会影响后续的操作。推断一般是等到200/300epoch之后,暂停掉训练部分然后运行推断,推断完后回去点击训练部分还可以继续训练。着急的话100epoch后就可以停下训练,推断体验一下

感谢,之后我会去试试的

KJZH001 commented 1 year ago

是的 继续跑就行,不会影响后续的操作。推断一般是等到200/300epoch之后,暂停掉训练部分然后运行推断,推断完后回去点击训练部分还可以继续训练。着急的话100epoch后就可以停下训练,推断体验一下

因为之前的训练集没有保存完整,所以我花了一些时间去重新跑了一次whisper

但是训练的时候似乎出现了一些其他的状况,并没有像您说的一样正常跑下去,稍后我会把log贴上来,希望您可以帮我看一下怎么回事

KJZH001 commented 1 year ago
[INFO] {'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 800, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 24, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/train_filelist.txt.cleaned', 'validation_files': 'filelists/val_filelist.txt.cleaned', 'text_cleaners': ['japanese_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0, 'cleaned_text': True}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs/isla_base'}
2023-03-04 09:59:22.103953: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-04 09:59:22.104109: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-04 09:59:22.104126: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ../aten/src/ATen/EmptyTensor.cpp:31.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:197: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [1, 9, 96], strides() = [39648, 96, 1]
bucket_view.sizes() = [1, 9, 96], strides() = [864, 96, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[INFO] Train Epoch: 1 [0%]
[INFO] [6.065995216369629, 6.065133094787598, 0.3276067078113556, 103.9568862915039, 1.7774611711502075, 203.92404174804688, 0, 0.0002]
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Traceback (most recent call last):
  File "train.py", line 295, in <module>
    main()
  File "train.py", line 55, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/whisper-vits-japanese/train.py", line 122, in run
    train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
  File "/content/whisper-vits-japanese/train.py", line 229, in train_and_evaluate
    evaluate(hps, net_g, eval_loader, writer_eval)
  File "/content/whisper-vits-japanese/train.py", line 241, in evaluate
    for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(eval_loader):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 543, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/whisper-vits-japanese/data_utils.py", line 101, in __getitem__
    return self.get_audio_text_pair(self.audiopaths_and_text[index])
  File "/content/whisper-vits-japanese/data_utils.py", line 62, in get_audio_text_pair
    spec, wav = self.get_audio(audiopath)
  File "/content/whisper-vits-japanese/data_utils.py", line 83, in get_audio
    spec = spectrogram_torch(audio_norm, self.filter_length,
  File "/content/whisper-vits-japanese/mel_processing.py", line 52, in spectrogram_torch
    if torch.min(y) < -1.:
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

好奇怪,为什么一复制出来格式就会丢失

拖动倒是没事

AlexandaJerry commented 1 year ago

您好,这是因为音频采样深度不等于16bit,cofig里默认参数为2的15次方"max_wav_value": 32768.0 您可以选择:A. 批量修改采样深度为16bit 或者是 B.按照音频的采样深度将"max_wav_value"改为2的n-1次方

KJZH001 commented 1 year ago

您好,这是因为音频采样深度不等于16bit,cofig里默认参数为2的15次方"max_wav_value": 32768.0 您可以选择:A. 批量修改采样深度为16bit 或者是 B.按照音频的采样深度将"max_wav_value"改为2的n-1次方

感谢您的解答,不过我有个地方还是没有理解您的意思

请问B方案的n是指采样深度的比特数吗?我用计算器算了一下,以16bit为例,2^15刚好为32768

因为考虑到国内从谷歌网盘取回这么大规模的文件确实比价困难,所以我考虑先尝试一下方案B

要是不行的话我就去用海外的windows服务器直接跑格式工厂做转换吧,总之感谢您的帮助了

KJZH001 commented 1 year ago

您好,不过好像出现了一些意料之外的状况 我将colab的运行时切换到了None状态,然后利用python执行了以下代码

# 查看采样深度
import wave

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0618 #15328_0.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0658_0.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0675_1.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

这是它的返回

16
16
16

文件是使用Save Materials and Checkpoints to Drive for Future Usage单元格部分备份后从先前的备份中随机抽取的

看起来训练集中的采样深度都是16bit

另外这是我在本地运行ffprobe解析得到的结果,因此我认为会不会是别的方面出现了问题呢?

ffprobe version 3.2.2 Copyright (c) 2007-2016 the FFmpeg developers
  built with gcc 5.4.0 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib
  libavutil      55. 34.100 / 55. 34.100
  libavcodec     57. 64.101 / 57. 64.101
  libavformat    57. 56.100 / 57. 56.100
  libavdevice    57.  1.100 / 57.  1.100
  libavfilter     6. 65.100 /  6. 65.100
  libswscale      4.  2.100 /  4.  2.100
  libswresample   2.  3.100 /  2.  3.100
  libpostproc    54.  1.100 / 54.  1.100
Input #0, wav, from 'C:\Users\用户名\Downloads\$2$min0620 #15333_0.wav':
  Duration: 00:00:05.47, bitrate: 352 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, s16, 352 kb/s
KJZH001 commented 1 year ago

这是我后面做了一些测试之后的补充

!file '/content/whisper-vits-japanese/audio/pm_a02_02_isl0093.wav' 这是您提供的White老师的原始数据集中的音频

/content/whisper-vits-japanese/audio/pm_a02_02_isl0093.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz

!file '/content/drive/MyDrive/sliced_audio/$2$min0675_1.wav' 这个是我自己训练集中的音频内容

/content/drive/MyDrive/sliced_audio/$2$min0675_1.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz

一次性接连发了那么多issue,希望没有打扰到您,对此我表示抱歉

AlexandaJerry commented 1 year ago

您好,这是因为音频采样深度不等于16bit,cofig里默认参数为2的15次方"max_wav_value": 32768.0 您可以选择:A. 批量修改采样深度为16bit 或者是 B.按照音频的采样深度将"max_wav_value"改为2的n-1次方

感谢您的解答,不过我有个地方还是没有理解您的意思

请问B方案的n是指采样深度的比特数吗?我用计算器算了一下,以16bit为例,2^15刚好为32768

因为考虑到国内从谷歌网盘取回这么大规模的文件确实比价困难,所以我考虑先尝试一下方案B

要是不行的话我就去用海外的windows服务器直接跑格式工厂做转换吧,总之感谢您的帮助了

是的,我之所以怀疑采样深度有问题是因为: 之前在B站遇到过水友问询 if torch.min(y) < -1.: 原代码中应该是统一做响度归1处理,比如对于16bit的声音来说,取值范围在(-32768,32768)区间 这样除以2的15次方之后(也就是除以"max_wav_value": 32768.0)之后,正好变为(-1,1)的区间 如果位深大于16bit,那么处理完之后区间肯定超出(-1,1),因此作者在mel_processing.py文件夹中进行了提示 def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False): if torch.min(y) < -1.: print('min value is ', torch.min(y)) if torch.max(y) > 1.: print('max value is ', torch.max(y))

AlexandaJerry commented 1 year ago

您好,不过好像出现了一些意料之外的状况 我将colab的运行时切换到了None状态,然后利用python执行了以下代码

# 查看采样深度
import wave

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0618 #15328_0.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0658_0.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0675_1.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

这是它的返回

16
16
16

文件是使用Save Materials and Checkpoints to Drive for Future Usage单元格部分备份后从先前的备份中随机抽取的

看起来训练集中的采样深度都是16bit

另外这是我在本地运行ffprobe解析得到的结果,因此我认为会不会是别的方面出现了问题呢?

ffprobe version 3.2.2 Copyright (c) 2007-2016 the FFmpeg developers
  built with gcc 5.4.0 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib
  libavutil      55. 34.100 / 55. 34.100
  libavcodec     57. 64.101 / 57. 64.101
  libavformat    57. 56.100 / 57. 56.100
  libavdevice    57.  1.100 / 57.  1.100
  libavfilter     6. 65.100 /  6. 65.100
  libswscale      4.  2.100 /  4.  2.100
  libswresample   2.  3.100 /  2.  3.100
  libpostproc    54.  1.100 / 54.  1.100
Input #0, wav, from 'C:\Users\用户名\Downloads\$2$min0620 #15333_0.wav':
  Duration: 00:00:05.47, bitrate: 352 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, s16, 352 kb/s

就后续的反馈来看,我觉得这里报错看来不是采样位深的问题 于是我来到 File "/content/whisper-vits-japanese/data_utils.py", line 83, in get_audio spec = spectrogram_torch(audio_norm, self.filter_length,这一行寻找可能的问题

如果函数def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False): if torch.min(y) < -1.: print('min value is ', torch.min(y))正常运行,那么它应该打印出一个数值 而不是报错min(): Expected reduction dim to be specified for input.numel() == 0.

后续我查到If the input tensor becomes empt, torch.max() will give an error RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the ‘dim’ argument. https://discuss.pytorch.org/t/convert-tensorflow-code-to-pytorch/69996/4 所以这里较为可能的问题是 spec = spectrogram_torch(audio_norm)中的audio_norm传入了空张量 可能有的音频存在问题,可以按照切片后音频的大小排序,看看有没有极其小的音频文件 也可以看看是否有隐藏的文件,以wav格式结尾,但是没有大小

AlexandaJerry commented 1 year ago

这是我后面做了一些测试之后的补充

!file '/content/whisper-vits-japanese/audio/pm_a02_02_isl0093.wav' 这是您提供的White老师的原始数据集中的音频

/content/whisper-vits-japanese/audio/pm_a02_02_isl0093.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz

!file '/content/drive/MyDrive/sliced_audio/$2$min0675_1.wav' 这个是我自己训练集中的音频内容

/content/drive/MyDrive/sliced_audio/$2$min0675_1.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz

一次性接连发了那么多issue,希望没有打扰到您,对此我表示抱歉

没关系,不用客气!大家一起debug是有乐趣和有意义的事情,希望能早日解决问题!

AlexandaJerry commented 1 year ago

您好,不过好像出现了一些意料之外的状况 我将colab的运行时切换到了None状态,然后利用python执行了以下代码

# 查看采样深度
import wave

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0618 #15328_0.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0658_0.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

with wave.open('/content/drive/MyDrive/sliced_audio/$2$min0675_1.wav', 'rb') as f:
    print(f.getsampwidth() * 8)  # 采样深度(单位:位)

这是它的返回

16
16
16

文件是使用Save Materials and Checkpoints to Drive for Future Usage单元格部分备份后从先前的备份中随机抽取的 看起来训练集中的采样深度都是16bit 另外这是我在本地运行ffprobe解析得到的结果,因此我认为会不会是别的方面出现了问题呢?

ffprobe version 3.2.2 Copyright (c) 2007-2016 the FFmpeg developers
  built with gcc 5.4.0 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib
  libavutil      55. 34.100 / 55. 34.100
  libavcodec     57. 64.101 / 57. 64.101
  libavformat    57. 56.100 / 57. 56.100
  libavdevice    57.  1.100 / 57.  1.100
  libavfilter     6. 65.100 /  6. 65.100
  libswscale      4.  2.100 /  4.  2.100
  libswresample   2.  3.100 /  2.  3.100
  libpostproc    54.  1.100 / 54.  1.100
Input #0, wav, from 'C:\Users\用户名\Downloads\$2$min0620 #15333_0.wav':
  Duration: 00:00:05.47, bitrate: 352 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, s16, 352 kb/s

就后续的反馈来看,我觉得这里报错看来不是采样位深的问题 于是我来到 File "/content/whisper-vits-japanese/data_utils.py", line 83, in get_audio spec = spectrogram_torch(audio_norm, self.filter_length,这一行寻找可能的问题

如果函数def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False): if torch.min(y) < -1.: print('min value is ', torch.min(y))正常运行,那么它应该打印出一个数值 而不是报错min(): Expected reduction dim to be specified for input.numel() == 0.

后续我查到If the input tensor becomes empt, torch.max() will give an error RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the ‘dim’ argument. https://discuss.pytorch.org/t/convert-tensorflow-code-to-pytorch/69996/4 所以这里较为可能的问题是 spec = spectrogram_torch(audio_norm)中的audio_norm传入了空张量 可能有的音频存在问题,可以按照切片后音频的大小排序,看看有没有极其小的音频文件 也可以看看是否有隐藏的文件,以wav格式结尾,但是没有大小

如果音频没有问题的话,我想到还有一种报错的可能原因是:作者源代码里忘记指定取最大值的维度 从 audio_norm = audio / self.max_wav_value audio_norm = audio_norm.unsqueeze(0) spec = spectrogram_torch(audio_norm, self.filter_length) 到 def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False): if torch.min(y) < -1.: print('min value is ', torch.min(y)) 来看audio_norm是补过维度之后然后再进入的torch.min() 查完 pytorch1.13的手册发现torch.min(input, dim, keepdim=False, *, out=None)需要指定维度 所以我们可以试着把mel_processing.py第52-55行注释掉不运行 def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False): if torch.min(y,1) < -1.: print('min value is ', torch.min(y,1)) if torch.max(y,1) > 1.: print('max value is ', torch.max(y,1))

KJZH001 commented 1 year ago

您好,我在海外的服务器上挂了一个下午把文件取回来了 然后安装大小排序确实找到了一些不正常的wav文件(大小为1kb),大约有115个左右 然后还有一些2-30kb左右的文件,我在尝试找出所有时长为0的文件然后将他们分离出来 这是相关的信息

Input #0, wav, from 'C:\Users\用户名\Downloads[test]\$4$min1411 #18074_0.wav': Duration: N/A, bitrate: 352 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, s16, 352 kb/s

可以看到时间为0,那么我想将他们删掉应该就可以正常训练了吧?

KJZH001 commented 1 year ago

遇到了一个有点奇怪的事情,想问一下您有没有遇到过 这些音频文件按大小排序后,1kb-43kb显示的都是0s的时长,1kb的是什么声音都没有,但是后面排查到40kb+的文件大小的时候我打开了一些听了一下,虽然显示的还是0s,不过还是有声音的 为了保险起见,目前我已经将所有显示为0s的音频给分离开到其他的文件夹了(包括上面说到有声音能打开的那部分)

AlexandaJerry commented 1 year ago

遇到了一个有点奇怪的事情,想问一下您有没有遇到过 这些音频文件按大小排序后,1kb-43kb显示的都是0s的时长,1kb的是什么声音都没有,但是后面排查到40kb+的文件大小的时候我打开了一些听了一下,虽然显示的还是0kb,不过还是有声音的 为了保险起见,目前我已经将所有显示为0s的音频给分离开到其他的文件夹了(包括上面说到有声音能打开的那部分)

好的收到!删除这些超小的音频后,训练可以正常进行了吗?

KJZH001 commented 1 year ago

似乎并不行

[INFO] {'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 800, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 24, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/train_filelist.txt.cleaned', 'validation_files': 'filelists/val_filelist.txt.cleaned', 'text_cleaners': ['japanese_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0, 'cleaned_text': True}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs/isla_base'}
2023-03-07 04:39:27.997241: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-07 04:39:29.389976: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-07 04:39:29.390120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-07 04:39:29.390147: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
  File "train.py", line 295, in <module>
    main()
  File "train.py", line 55, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/whisper-vits-japanese/train.py", line 71, in run
    train_dataset = TextAudioLoader(hps.data.training_files, hps.data)
  File "/content/whisper-vits-japanese/data_utils.py", line 38, in __init__
    self._filter()
  File "/content/whisper-vits-japanese/data_utils.py", line 54, in _filter
    lengths.append(os.path.getsize(audiopath) // (2 * self.hop_length))
  File "/usr/lib/python3.8/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '/content/whisper-vits-japanese/sliced_audio/$4$min1443 #18139_10.wav'

现在提示找不到对应的音频文件了,把大于1kb的文件重新补回去之后还是这么报错

AlexandaJerry commented 1 year ago

额。 如果删了音频文件,当然要更新训练集和测试集路径,如"train_filelist.txt"; 既然音频存在问题,那么删了之后,是不能补回去的,没有时长的音频不可以留着 这样看代码是没有问题的,是音频的问题,那应该不是我的问题了,祝debug顺利

KJZH001 commented 1 year ago

额。 如果删了音频文件,当然要更新训练集和测试集路径,如"train_filelist.txt"; 既然音频存在问题,那么删了之后,是不能补回去的,没有时长的音频不可以留着 这样看代码是没有问题的,是音频的问题,那应该不是我的问题了,祝debug顺利

总之先感谢您的帮助了,至于为什么会看到缺文件的时候就想着补回去,是因为我想当然的认为了whisper和自动切片组合训练集的程序会跳过那些无法识别的音频,理解为有些其实能用的音频被我误伤了,所以就想着补回去看看

那么我接下来就用处理好的音频重新回到whisper那步重新生成训练集的内容往下跑吧,祝我能够一次跑通!

KJZH001 commented 1 year ago
[INFO] {'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 800, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 24, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/train_filelist.txt.cleaned', 'validation_files': 'filelists/val_filelist.txt.cleaned', 'text_cleaners': ['japanese_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0, 'cleaned_text': True}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs/isla_base'}
2023-03-07 11:40:46.483137: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-07 11:40:46.483270: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-07 11:40:46.483292: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ../aten/src/ATen/EmptyTensor.cpp:31.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:197: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [1, 9, 96], strides() = [23136, 96, 1]
bucket_view.sizes() = [1, 9, 96], strides() = [864, 96, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[INFO] Train Epoch: 1 [0%]
[INFO] [6.065838813781738, 6.065133094787598, 0.4799965023994446, 102.53410339355469, 2.036228656768799, 216.6588592529297, 0, 0.0002]
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
terminate called without an active exception
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f3806484790>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1466, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1430, in _shutdown_workers
    w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
  File "/usr/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
    if not wait([self.sentinel], timeout):
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/usr/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 44412) is killed by signal: Aborted. 
[INFO] Saving model and optimizer state at iteration 1 to ./logs/isla_base/G_0.pth
[INFO] Saving model and optimizer state at iteration 1 to ./logs/isla_base/D_0.pth
[INFO] ====> Epoch: 1
[INFO] Train Epoch: 2 [28%]

系统 RAM 8.6 / 12.7 GB

GPU RAM 12.1 / 15.0 GB

虽然到处是警告,不过终于算是跑起来了 最后再一次感谢您的帮助和您的项目对开源的贡献

同时也祝我接下来训练的过程中一切顺利

AlexandaJerry commented 1 year ago

不用客气!第一轮跑起来后面就没问题了

KJZH001 commented 1 year ago

不用客气!第一轮跑起来后面就没问题了

你好,我目前已经在接着往下炼了,目前大概跑到90epoch左右,不过我想就自己的实际体验来说向您提供一点自己的建议

因为colab的环境是基于ubuntu的,并且这个环境是内置了unzip和zip命令可以用于压缩解压(如果我没记错的话),所以我认为会不会在备份时将训练集打包一下再复制到云盘会更好一些

根据网络上可靠的资料,不难发现colab挂载网盘是使用网络进行传输的,这也导致了传输大量小文件的时候性能会比较低下,同时colab的机子是G口的带宽,如果打包之后传输的性能在理论上也会好上很多(直接复制的话我上次备份单元格跑了28分钟,这个速度个人感觉确实不是很理想)

同时,可能也有人会有一些不同的需求,比如说使用rclone进行挂载而不是用colab内置的挂载程序来实现挂载onedrive等网盘,在这种情况下上传大量文件会难以确定是否确实传输完毕了(因为非pro的colab无法看到终端)

当然,我也明白直接进行复制也有您自己的考虑,这只是我个人的一些建议而已

Raincarnator commented 1 year ago

你好,偶然看到这则issue发现,我与你的想法相同,下面是我的实现方式,可以提供参考。

#把checkpoint存入google drive
!rm -rf /content/drive/MyDrive/vits/logs.zip
!zip -q -r /content/drive/MyDrive/vits/logs.zip /content/whisper-vits-japanese/logs
#把音频文件和对应抄本存入google drive
!rm -rf /content/drive/MyDrive/vits/sliced_audio.zip
!zip -q -r /content/drive/MyDrive/vits/sliced_audio.zip /content/whisper-vits-japanese/sliced_audio
!rm -rf /content/drive/MyDrive/vits/filelists.zip
!zip -q -r /content/drive/MyDrive/vits/filelists.zip /content/whisper-vits-japanese/filelists

由于zip打包时若该压缩包文件已存在会自动将新内容添加到原压缩包内而不是重新打包,导致最后存档包越来越臃肿,很容易填满GDrive的15G,因此我这里在打包前会手动删除 logs 内多余的模型,并删除原压缩包(当然,我每次都会下载并备份在本地)。

#把google drive的checkpoint恢复到文件夹
!unzip -o /content/drive/MyDrive/vits/logs.zip "*" -d /content/..
#把音频文件和对应抄本恢复到文件夹
!unzip -j -o /content/drive/MyDrive/vits/sliced_audio.zip "*" -d /content/whisper-vits-japanese/sliced_audio
!unzip -j -o /content/drive/MyDrive/vits/filelists.zip "*" -d /content/whisper-vits-japanese/filelists

初次之外,我还编写了文本clean工具以方便MoeTTS等推理端的使用。

#@title 文本clean工具
import text
clean_text = "" #@param {type:"string"}
print(text._clean_text(clean_text, hps.data.text_cleaners))

希望能对其他的炼丹师有帮助。

AlexandaJerry commented 1 year ago

你好,偶然看到这则issue发现,我与你的想法相同,下面是我的实现方式,可以提供参考。

#把checkpoint存入google drive
!rm -rf /content/drive/MyDrive/vits/logs.zip
!zip -q -r /content/drive/MyDrive/vits/logs.zip /content/whisper-vits-japanese/logs
#把音频文件和对应抄本存入google drive
!rm -rf /content/drive/MyDrive/vits/sliced_audio.zip
!zip -q -r /content/drive/MyDrive/vits/sliced_audio.zip /content/whisper-vits-japanese/sliced_audio
!rm -rf /content/drive/MyDrive/vits/filelists.zip
!zip -q -r /content/drive/MyDrive/vits/filelists.zip /content/whisper-vits-japanese/filelists

由于zip打包时若该压缩包文件已存在会自动将新内容添加到原压缩包内而不是重新打包,导致最后存档包越来越臃肿,很容易填满GDrive的15G,因此我这里在打包前会手动删除 logs 内多余的模型,并删除原压缩包(当然,我每次都会下载并备份在本地)。

#把google drive的checkpoint恢复到文件夹
!unzip -o /content/drive/MyDrive/vits/logs.zip "*" -d /content/..
#把音频文件和对应抄本恢复到文件夹
!unzip -j -o /content/drive/MyDrive/vits/sliced_audio.zip "*" -d /content/whisper-vits-japanese/sliced_audio
!unzip -j -o /content/drive/MyDrive/vits/filelists.zip "*" -d /content/whisper-vits-japanese/filelists

初次之外,我还编写了文本clean工具以方便MoeTTS等推理端的使用。

#@title 文本clean工具
import text
clean_text = "" #@param {type:"string"}
print(text._clean_text(clean_text, hps.data.text_cleaners))

希望能对其他的炼丹师有帮助。

你是我滴神