effusiveperiscope / so-vits-svc

so-vits-svc
MIT License
179 stars 72 forks source link

Training a pre-trained model #25

Closed mya2152 closed 1 year ago

mya2152 commented 1 year ago

I currently have only the G_2100.pth file and can produce vocals with inference with it and the config.json, however, if I want to train the model further I load the file into /logs/44k/ but when running the train.py script to train the model even further I keep getting an error telling me:

File "/notebooks/so-vits-svc/train.py", line 120, in run raise Exception("No pretrained model found") Exception: No pretrained model found

I did also try changing the name to "G_0.pth" similar to the original HF pretrained model but still get the message. Unfortunately the instance was shut down and I only got the generative "G" file saved as well as the configs.json, is it possible to continue training this?

Thanks in advance

effusiveperiscope commented 1 year ago

I inserted that exception in the training code mostly to prevent me from accidentally training without a pre-trained model; I haven't run into a situation yet where the exception is raised outside of that. I would suggest adding a traceback.print_exc above the line where the exception is raised to see what the actual problem is

mya2152 commented 1 year ago

FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 2. Original Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/notebooks/so-vits-svc/data_utils.py", line 88, in getitem return self.get_audio(self.audiopaths[index][0]) File "/notebooks/so-vits-svc/data_utils.py", line 62, in get_audio f0 = np.load(filename + ".f0.npy") File "/usr/local/lib/python3.9/dist-packages/numpy/lib/npyio.py", line 405, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: './dataset/44k/jm12/voca111101_67.wav.f0.npy'

mya2152 commented 1 year ago

strange because I did already run the hubert and f0 generation preprocess step

effusiveperiscope commented 1 year ago

If you're doing this through a service like colab is it possible that the dataset folder might not be mounted correctly?

mya2152 commented 1 year ago

it turns out the hubert and f0 preprocess step isnt actually generating the ".f0.npy" files in the dataset folder anymore for some reason.

Edit: nevermind, i got it to generate successfully but getting the problem below.

mya2152 commented 1 year ago

When I commented out the raise exception line 120 in train.py it notified me the load checkpoint failed but then started working but only for a minute or so:

INFO:44k:Saving model and optimizer state at iteration 1 to ./logs/44k/G_0.pth INFO:44k:Saving model and optimizer state at iteration 1 to ./logs/44k/D_0.pth INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. Traceback (most recent call last): File "/notebooks/so-vits-svc/train.py", line 328, in main() File "/notebooks/so-vits-svc/train.py", line 63, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/notebooks/so-vits-svc/train.py", line 137, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, File "/notebooks/so-vits-svc/train.py", line 159, in train_and_evaluate for batch_idx, items in enumerate(train_loader): File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 634, in next data = self._next_data() File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1326, in _next_data return self._process_data(data) File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/usr/local/lib/python3.9/dist-packages/torch/_utils.py", line 644, in reraise raise exception UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 3. Original Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/notebooks/so-vits-svc/data_utils.py", line 88, in getitem return self.get_audio(self.audiopaths[index][0]) File "/notebooks/so-vits-svc/data_utils.py", line 43, in get_audio audio, sampling_rate = load_wav_to_torch(filename) File "/notebooks/so-vits-svc/sovits_utils.py", line 417, in load_wav_to_torch sampling_rate, data = read(full_path) File "/usr/local/lib/python3.9/dist-packages/scipy/io/wavfile.py", line 707, in read return fs, data UnboundLocalError: local variable 'fs' referenced before assignment

mya2152 commented 1 year ago

If you're doing this through a service like colab is it possible that the dataset folder might not be mounted correctly?

I figured hopefully having only the "configs.json" and the "G_2100.pth" would be enough to resume training but am I missing something or would it just not work because I"m missing some sort of checkpoint file which I couldn't save earlier?

effusiveperiscope commented 1 year ago

You need the D_* during training too

mya2152 commented 1 year ago

You need the D_* during training too

So D file has the be the exact same one that was produced with the G file it seems, I was under the impression you could use any D files and couple it with a G file to continue the pretraining

effusiveperiscope commented 1 year ago

It's a GAN; the discriminator and generator (D and G) are trained at the same time.