IAHispano / Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.
https://applio.org
MIT License
1.72k stars 279 forks source link

[Bug]: Error when training on Kaggle #824

Open Kuchiriel opened 1 week ago

Kuchiriel commented 1 week ago

Project Version

Latest

Platform and OS Version

Kaggle

Affected Devices

Kaggle Latest Environment

Existing Issues

No response

What happened?

Traceback (most recent call last): File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/kaggle/working/program_ml/rvc/train/train.py", line 509, in run train_and_evaluate( File "/kaggle/working/program_ml/rvc/train/train.py", line 707, in train_and_evaluate scaler.scale(loss_disc).backward() File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward torch.autograd.backward( File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/autograd/init.py", line 267, in backward _engine_run_backward( File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [172.19.2.2]:48294 /opt/conda/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 42 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Steps to reproduce

Happens during training between 100 ~ 500 epochs

Expected behavior

Continue the training without this error

Attachments

No response

Screenshots or Videos

No response

Additional Information

No response

aris-py commented 5 days ago

@Vidalnt