Open mr-segfault opened 1 year ago
Updated to the current version that does not need the total_fea.npy output and the error still persists:
INFO:train5:Saving model and optimizer state at epoch 20 to ./logs/train5/G_20.pth
INFO:train5:Saving model and optimizer state at epoch 20 to ./logs/train5/D_20.pth
INFO:train5:====> Epoch: 20
INFO:train5:Training is done. The program is closed.
INFO:train5:saving final ckpt:Success.
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
Tried again (4/27) with a current build, it looks like the Train Feature Index feature is still broken (not preparing index file)
Logs are slightly different because the code has been changing but the underlying issue is still present.
INFO:train6:====> Epoch: 20
INFO:train6:Training is done. The program is closed.
INFO:train6:saving final ckpt:Success.
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
I was worried that it was my lower powered PC causing the issue so I tried it on my more powerful one, The logs are less lengthy but are still an expression of the bug - maybe this makes it easier to chase down?
INFO:test2:====> Epoch: 20
INFO:test2:Training is done. The program is closed.
INFO:test2:saving final ckpt:Success.
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
+1 facing the exact same issue.
Your GPU is, 10xx, 16xx, P40, or others?
I was getting this feedback on a dual 3060 (12gb ea) as well as a 1660ti
*
@RVC-Boss Google colab - Tesla T4
INFO:model:Training is done. The program is closed. INFO:model:saving final ckpt:Traceback (most recent call last): File "/content/Retrieval-based-Voice-Conversion-WebUI/train/process_ckpt.py", line 79, in savee torch.save(opt, "weights/%s.pth" % name) File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 440, in save with _open_zipfile_writer(f) as opened_zipfile: File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 315, in _open_zipfile_writer return container(name_or_buffer) File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 288, in init super().init(torch._C.PyTorchFileWriter(str(name))) RuntimeError: Parent directory weights/content/dataset does not exist.
Traceback (most recent call last):
File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 1535, in
Save model Zip to Drive
cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/logs/model/added*.index': No such file or directory cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/logs/model/total.npy': No such file or directory cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/weights/model.pth': No such file or directory /content/zips/model zip warning: name not matched:
zip error: Nothing to do! (try: zip -r model.zip . -i *) mv: cannot stat 'model.zip': No such file or directory /content/Retrieval-based-Voice-Conversion-WebUI
I was getting this feedback on a dual 3060 (12gb ea) as well as a 1660ti
you can try my way. https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/165#issuecomment-1524532607
+1 facing the exact same issue. I use epoch 50 to train at colab it will produce .index and can copy to google drive but when i use epoch 200 to train .index disappeared and can't copy to google drive
This issue is still present on the current (as of today's writing) version though the behavior is different -- a file is written but the application still throws a torch exception:
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d
I am stuck with the same problem, can someone direct me to a solution to the same?
I have the same errors in the log as the author of this thread. The application version is the latest at the moment. The training ends. The .pth model is available. Index file is not created. Has anyone been able to find a solution to this problem?
same question
Wow alot of people having this issue with no resolution?
Using current version of RVC (pulled the latest to verify just before writing this report),
When Training, it generates the weights but does not generate the feature file or database file required for inference. * No Index File is created
I'm using an Ubuntu system, I installed RVC via a venv to ensure no conflicts. This is the tail end of the training excerpt and crash log:
INFO:model3:Saving model and optimizer state at epoch 200 to ./logs/model3/G_200.pth INFO:model3:Saving model and optimizer state at epoch 200 to ./logs/model3/D_200.pth INFO:model3:====> Epoch: 200 INFO:model3:Training is done. The program is closed. INFO:model3:saving final ckpt:Success. Traceback (most recent call last): File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
main()
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main
mp.spawn(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1039, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/utils.py", line 491, in async_iteration
return next(iterator)
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 844, in train1key
big_npy = np.concatenate(npys, 0)
File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '