Feature File Path and Database File paths not being written after training

mr-segfault commented 1 year ago

Using current version of RVC (pulled the latest to verify just before writing this report),

When Training, it generates the weights but does not generate the feature file or database file required for inference. * No Index File is created

I'm using an Ubuntu system, I installed RVC via a venv to ensure no conflicts. This is the tail end of the training excerpt and crash log:

INFO:model3:Saving model and optimizer state at epoch 200 to ./logs/model3/G_200.pth INFO:model3:Saving model and optimizer state at epoch 200 to ./logs/model3/D_200.pth INFO:model3:====> Epoch: 200 INFO:model3:Training is done. The program is closed. INFO:model3:saving final ckpt:Success. Traceback (most recent call last): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in main() File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main mp.spawn( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149 Traceback (most recent call last): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api result = await self.call_function( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1039, in call_function prediction = await anyio.to_thread.run_sync( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/utils.py", line 491, in async_iteration return next(iterator) File "/home/user/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 844, in train1key big_npy = np.concatenate(npys, 0) File "<__array_function__ internals>", line 180, in concatenate ValueError: need at least one array to concatenate /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

mr-segfault commented 1 year ago

Updated to the current version that does not need the total_fea.npy output and the error still persists:

INFO:train5:Saving model and optimizer state at epoch 20 to ./logs/train5/G_20.pth INFO:train5:Saving model and optimizer state at epoch 20 to ./logs/train5/D_20.pth INFO:train5:====> Epoch: 20 INFO:train5:Training is done. The program is closed. INFO:train5:saving final ckpt:Success. Traceback (most recent call last): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in main() File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main mp.spawn( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149 Traceback (most recent call last): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api result = await self.call_function( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1039, in call_function prediction = await anyio.to_thread.run_sync( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/utils.py", line 491, in async_iteration return next(iterator) File "/home/user/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 852, in train1key big_npy = np.concatenate(npys, 0) File "<__array_function__ internals>", line 180, in concatenate ValueError: need at least one array to concatenate /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

mr-segfault commented 1 year ago

Tried again (4/27) with a current build, it looks like the Train Feature Index feature is still broken (not preparing index file)

Logs are slightly different because the code has been changing but the underlying issue is still present.

INFO:train6:====> Epoch: 20 INFO:train6:Training is done. The program is closed. INFO:train6:saving final ckpt:Success. Traceback (most recent call last): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in main() File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main mp.spawn( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149 Traceback (most recent call last): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api result = await self.call_function( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1039, in call_function prediction = await anyio.to_thread.run_sync( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/utils.py", line 491, in async_iteration return next(iterator) File "/home/user/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 859, in train1key big_npy = np.concatenate(npys, 0) File "<__array_function__ internals>", line 180, in concatenate ValueError: need at least one array to concatenate /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

mr-segfault commented 1 year ago

I was worried that it was my lower powered PC causing the issue so I tried it on my more powerful one, The logs are less lengthy but are still an expression of the bug - maybe this makes it easier to chase down?

INFO:test2:====> Epoch: 20 INFO:test2:Training is done. The program is closed. INFO:test2:saving final ckpt:Success. Traceback (most recent call last): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in main() File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main mp.spawn( File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149 /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

babya-ml commented 1 year ago

+1 facing the exact same issue.

RVC-Boss commented 1 year ago

Your GPU is, 10xx, 16xx, P40, or others?

mr-segfault commented 1 year ago

I was getting this feedback on a dual 3060 (12gb ea) as well as a 1660ti

Tybost commented 1 year ago

*

babya-ml commented 1 year ago

@RVC-Boss Google colab - Tesla T4

freeabt commented 1 year ago

same here, missing the file and then having trouble generating the model. colab free GPU

INFO:model:Training is done. The program is closed. INFO:model:saving final ckpt:Traceback (most recent call last): File "/content/Retrieval-based-Voice-Conversion-WebUI/train/process_ckpt.py", line 79, in savee torch.save(opt, "weights/%s.pth" % name) File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 440, in save with _open_zipfile_writer(f) as opened_zipfile: File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 315, in _open_zipfile_writer return container(name_or_buffer) File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 288, in init super().init(torch._C.PyTorchFileWriter(str(name))) RuntimeError: Parent directory weights/content/dataset does not exist.

Traceback (most recent call last): File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in main() File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main mp.spawn( File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 149, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 399, in run_predict output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1299, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1036, in call_function prediction = await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 488, in async_iteration return next(iterator) File "/content/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 899, in train1key big_npy = np.concatenate(npys, 0) File "<__array_function__ internals>", line 180, in concatenate ValueError: need at least one array to concatenate /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' Keyboard interruption in main thread... closing server. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1984, in block_thread time.sleep(0.1) KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/content/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 1535, in app.queue(concurrency_count=511, max_size=1022).launch(share=True) File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1901, in launch self.block_thread() File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1987, in block_thread self.server.close() File "/usr/local/lib/python3.10/dist-packages/gradio/networking.py", line 43, in close self.thread.join() File "/usr/lib/python3.10/threading.py", line 1096, in join self._wait_for_tstate_lock() File "/usr/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock if lock.acquire(block, timeout): KeyboardInterrupt

Save model Zip to Drive

cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/logs/model/added*.index': No such file or directory cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/logs/model/total.npy': No such file or directory cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/weights/model.pth': No such file or directory /content/zips/model zip warning: name not matched:

zip error: Nothing to do! (try: zip -r model.zip . -i *) mv: cannot stat 'model.zip': No such file or directory /content/Retrieval-based-Voice-Conversion-WebUI

Yaodada12 commented 1 year ago

I was getting this feedback on a dual 3060 (12gb ea) as well as a 1660ti

you can try my way. https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/165#issuecomment-1524532607

DvoDan commented 1 year ago

+1 facing the exact same issue. I use epoch 50 to train at colab it will produce .index and can copy to google drive but when i use epoch 200 to train .index disappeared and can't copy to google drive

mr-segfault commented 1 year ago

This issue is still present on the current (as of today's writing) version though the behavior is different -- a file is written but the application still throws a torch exception:

torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d

shivanishimpi commented 1 year ago

I am stuck with the same problem, can someone direct me to a solution to the same?

skillslasher commented 8 months ago

I have the same errors in the log as the author of this thread. The application version is the latest at the moment. The training ends. The .pth model is available. Index file is not created. Has anyone been able to find a solution to this problem?

seetimee commented 7 months ago

same question

jpoetic commented 7 months ago

Wow alot of people having this issue with no resolution?

RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Feature File Path and Database File paths not being written after training #167

same here, missing the file and then having trouble generating the model. colab free GPU