Open Sndragon88 opened 11 months ago
Hi this is a common problem with the TTS lib https://github.com/coqui-ai/TTS/issues/3384#issue-2031525792 , but I was able to adapt it and now in my custom xtts_finetune_webui it is possible to train Japanese
finetune quality
https://github.com/daswer123/xtts-webui/assets/22278673/b4a394c6-419e-4d89-a5d7-d33db0dc1e70
Thanks, your new codes work. I got one finetune (with a few audio data) completed until the end inference test.
Afterwards, I made another finetune with more training data. This time it get past Epoch 5/6, then there's a PermissionError: [WinError 5] Access is denied . If I run the .bat file in administration, the cmd window will report a missing file in system32 and stops. Maybe it's because the "Clear train data" option? I tried setting it to "None", but the same error occured.
Edit: I found that deleting the finetune_model/run folder and running step 2 again works. So this may happen when we make a second finetune and the program can’t delete a file in this folder.
Traceback (most recent call last): File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\gradio\queueing.py", line 459, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\gradio\route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\gradio\blocks.py", line 1533, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\gradio\blocks.py", line 1151, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\gradio\utils.py", line 678, in wrapper response = f(args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\xtts_finetune_webui.py", line 339, in train_model os.remove(run_dir) PermissionError: [WinError 5] Access is denied: 'D:\Long\AI\Audio\xtts-webui\finetunemodels\run'
My wav file is converted into mono, 22050Hz, 16bit pcm beforehand. I got this error log:
Existing language matches target language Loading Whisper Model! Discarding ID3 tags because more suitable tags were found. Traceback (most recent call last): File "D:\Long\AI\Audio\xtts-webui\xtts_finetune_webui.py", line 246, in preprocess_dataset train_meta, eval_meta, audio_total_size = format_audio_list(audio_path, whisper_model = whisper_model, target_language=language, out_path=out_path, gradio_progress=progress) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\scripts\utils\formatter.py", line 160, in format_audio_list sentence = multilingual_cleaners(sentence, target_language) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\TTS\tts\layers\xtts\tokenizer.py", line 558, in multilingual_cleaners text = expand_numbers_multilingual(text, lang) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Long\AI\Audio\xtts-webui\venv\Lib\site-packages\TTS\tts\layers\xtts\tokenizer.py", line 538, in expand_numbers_multilingual text = re.sub(_ordinal_re[lang], lambda m: _expand_ordinal(m, lang), text)