继续上次的报了这种错QAQ

kuuga314 commented 1 year ago

./logs/isla_base/G_10000.pth [INFO] Loaded checkpoint './logs/isla_base/G_10000.pth' (iteration 34) ./logs/isla_base/D_10000.pth [INFO] Loaded checkpoint './logs/isla_base/D_10000.pth' (iteration 34) /usr/local/lib/python3.7/dist-packages/torch/functional.py:607: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.) normalized, onesided, return_complex) /usr/local/lib/python3.7/dist-packages/torch/functional.py:607: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ../aten/src/ATen/EmptyTensor.cpp:31.) normalized, onesided, return_complex) /usr/local/lib/python3.7/dist-packages/torch/autograd/init.py:175: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. grad.sizes() = [1, 9, 96], strides() = [34272, 96, 1] bucket_view.sizes() = [1, 9, 96], strides() = [864, 96, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:312.) allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass Traceback (most recent call last): File "train.py", line 295, in main() File "train.py", line 55, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/content/whisper-vits-japanese/train.py", line 122, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/content/whisper-vits-japanese/train.py", line 142, in train_and_evaluate for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(train_loader): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 461, in reraise raise exception EOFError: Caught EOFError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/whisper-vits-japanese/data_utils.py", line 94, in getitem return self.get_audio_text_pair(self.audiopaths_and_text[index]) File "/content/whisper-vits-japanese/data_utils.py", line 62, in get_audio_text_pair spec, wav = self.get_audio(audiopath) File "/content/whisper-vits-japanese/data_utils.py", line 74, in get_audio spec = torch.load(spec_filename) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 920, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) EOFError: Ran out of input

AlexandaJerry commented 1 year ago

https://github.com/open-mmlab/mmdetection/issues/8204

AlexandaJerry commented 1 year ago

spec = torch.load(spec_filename)这行会加载sliced audio文件夹下的spec.pt，文件夹中的spec.pt如果有损的话，就会报错，这种主要是由于保存中断或者未完全复制到drive导致的。如果debug能力强的话，采用spec = torch.load(spec_filename)逐次导入spec.pt文件检查哪个有损，比如可用下方思路在data_utils.py中line 74采取try和except方式进行调试，例如：try: spec = torch.load(spec_filename) print("成功导入"+str(spec_filename)) except: print("未能成功导入"+str(spec_filename))

AlexandaJerry commented 1 year ago

简单版的思路就是在google drive的sliced audio文件夹下看看有没有哪个spec.pt格外小，正常的spec.pt在drive里双击是可以打开的，然后删掉双击打不开的spec.pt

kuuga314 commented 1 year ago

感谢解决了！！

AlexandaJerry commented 1 year ago

这个问题我直接后续做了代码更新，如果spec.pt坏了会自动做新的，所以存坏了网盘也没关系了

AlexandaJerry / whisper-vits-japanese

继续上次的报了这种错QAQ #4