DrewThomasson / VoxNovel

VoxNovel: generate audiobooks giving each character a different voice actor.
MIT License
139 stars 17 forks source link

Error using new voice with Fine Tuned XTTS model #32

Closed ScratchMode closed 1 month ago

ScratchMode commented 1 month ago

I've Imported this Clone a Voice Sample, and this Model and then when trying to generate audio immediately stalled out with the following error.

Using XTTS V2.0.2 Model Fine Tuned with this bastard of a git project

Added voice actor data to Working_files/Book/book.csv Added language data to the CSV file. Voice actor: Jay Snyder.M, en found fine tuned for voice actor: Jay Snyder.M: loading custom model... /home/voxnovel/miniconda/envs/VoxNovel/lib/python3.10/site-packages/TTS/utils/io.py:54: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. return torch.load(f, map_location=map_location, kwargs) Computing speaker latents... Inference... Exception in thread Thread-3 (generate_audio): Traceback (most recent call last): File "/home/voxnovel/miniconda/envs/VoxNovel/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/voxnovel/miniconda/envs/VoxNovel/lib/python3.10/threading.py", line 953, in run self._target(*self._args, *self._kwargs) File "/home/voxnovel/VoxNovel/gui_run.py", line 1914, in generate_audio fineTune_audio_generate(text=fragment, file_path=f"Working_files/temp/{temp_count}.wav", speaker_wav=speaker_wavz[0], language=language_code, voice_actor=voice_actor) File "/home/voxnovel/VoxNovel/gui_run.py", line 1762, in fineTune_audio_generate out = tts.inference( File "/home/voxnovel/miniconda/envs/VoxNovel/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/home/voxnovel/miniconda/envs/VoxNovel/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 532, in inference text_tokens = torch.IntTensor(self.tokenizer.encode(sent, lang=language)).unsqueeze(0).to(self.device) File "/home/voxnovel/miniconda/envs/VoxNovel/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 653, in encode return self.tokenizer.encode(txt).ids AttributeError: 'NoneType' object has no attribute 'encode'

DrewThomasson commented 1 month ago

It might be that for my import your suppose to be importing the folder containing all the files, and for some reason I had to make the vocab file have a after it like "vocab.json" instead of "vocab.json"

Confirmed on my end I get the same error if I incorrectly rename the "vocab.json" file to "vocab.json_"

Error when I get when I incorrectly name the vocab.json file No API token found for 🐸Coqui Studio voices - https://coqui.ai Visit 🔗https://app.coqui.ai/account to get one. Set it as an environment variable `export COQUI_STUDIO_TOKEN=` ['tts_models/multilingual/multi-dataset/xtts_v2', 'tts_models/multilingual/multi-dataset/xtts_v1.1', 'tts_models/multilingual/multi-dataset/your_tts', 'tts_models/multilingual/multi-dataset/bark', 'tts_models/bg/cv/vits', 'tts_models/cs/cv/vits', 'tts_models/da/cv/vits', 'tts_models/et/cv/vits', 'tts_models/ga/cv/vits', 'tts_models/en/ek1/tacotron2', 'tts_models/en/ljspeech/tacotron2-DDC', 'tts_models/en/ljspeech/tacotron2-DDC_ph', 'tts_models/en/ljspeech/glow-tts', 'tts_models/en/ljspeech/speedy-speech', 'tts_models/en/ljspeech/tacotron2-DCA', 'tts_models/en/ljspeech/vits', 'tts_models/en/ljspeech/vits--neon', 'tts_models/en/ljspeech/fast_pitch', 'tts_models/en/ljspeech/overflow', 'tts_models/en/ljspeech/neural_hmm', 'tts_models/en/vctk/vits', 'tts_models/en/vctk/fast_pitch', 'tts_models/en/sam/tacotron-DDC', 'tts_models/en/blizzard2013/capacitron-t2-c50', 'tts_models/en/blizzard2013/capacitron-t2-c150_v2', 'tts_models/en/multi-dataset/tortoise-v2', 'tts_models/en/jenny/jenny', 'tts_models/es/mai/tacotron2-DDC', 'tts_models/es/css10/vits', 'tts_models/fr/mai/tacotron2-DDC', 'tts_models/fr/css10/vits', 'tts_models/uk/mai/glow-tts', 'tts_models/uk/mai/vits', 'tts_models/zh-CN/baker/tacotron2-DDC-GST', 'tts_models/nl/mai/tacotron2-DDC', 'tts_models/nl/css10/vits', 'tts_models/de/thorsten/tacotron2-DCA', 'tts_models/de/thorsten/vits', 'tts_models/de/thorsten/tacotron2-DDC', 'tts_models/de/css10/vits-neon', 'tts_models/ja/kokoro/tacotron2-DDC', 'tts_models/tr/common-voice/glow-tts', 'tts_models/it/mai_female/glow-tts', 'tts_models/it/mai_female/vits', 'tts_models/it/mai_male/glow-tts', 'tts_models/it/mai_male/vits', 'tts_models/ewe/openbible/vits', 'tts_models/hau/openbible/vits', 'tts_models/lin/openbible/vits', 'tts_models/tw_akuapem/openbible/vits', 'tts_models/tw_asante/openbible/vits', 'tts_models/yor/openbible/vits', 'tts_models/hu/css10/vits', 'tts_models/el/cv/vits', 'tts_models/fi/css10/vits', 'tts_models/hr/cv/vits', 'tts_models/lt/cv/vits', 'tts_models/lv/cv/vits', 'tts_models/mt/cv/vits', 'tts_models/pl/mai_female/vits', 'tts_models/pt/cv/vits', 'tts_models/ro/cv/vits', 'tts_models/sk/cv/vits', 'tts_models/sl/cv/vits', 'tts_models/sv/cv/vits', 'tts_models/ca/custom/vits', 'tts_models/fa/custom/glow-tts', 'tts_models/bn/custom/vits-male', 'tts_models/bn/custom/vits-female', 'tts_models/be/common-voice/glow-tts'] > tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. > Using model: xtts > Text splitted to sentences. ['Hello world!'] Traceback (most recent call last): File "/Users/drew/Desktop/test.py", line 16, in wav = tts.tts(text="Hello world!", speaker_wav="1.wav", language="en") File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/TTS/api.py", line 364, in tts wav = self.synthesizer.tts( File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 383, in tts outputs = self.tts_model.synthesize( File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 397, in synthesize return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs) File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 419, in inference_with_config return self.full_inference(text, ref_audio_path, language, **settings) File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 488, in full_inference return self.inference( File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 532, in inference text_tokens = torch.IntTensor(self.tokenizer.encode(sent, lang=language)).unsqueeze(0).to(self.device) File "/opt/miniconda3/envs/VoxNovel/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 653, in encode return self.tokenizer.encode(txt).ids AttributeError: 'NoneType' object has no attribute 'encode'

FIX:

-If you imported it into VoxNovel then your imported custom model folder should be within the folder VoxNovel/tortoise/voices/Jay Snyder.M/model/{Here is where the custom model for that voice is imported into}

That should fix your issue. :)


Ignore this below lol this is extra just in case, Your issue is MOST LIKELY the thing above. :)

If not

Try using my docker image to fine tune a model locally (If you have a Nvidia graphics card): https://hub.docker.com/repository/docker/athomasson2/fine_tune_xtts/general

If you don't have nvida graphics card you could just my google colab image tho: https://colab.research.google.com/drive/1sqQqzupo2pdjgggkrbM60sU6sBFYo3su?usp=sharing

DrewThomasson commented 1 month ago

Hit me up if you run into any other issues! :)

DrewThomasson commented 1 month ago

Confirmed to fix the issue on my end!

As you can see here All I did was modify the name of the vocab.json file from "vocab.json" to "vocab.json_" and now it works :)

In fact here's your model back with that change (temp link-will auto-destruct once downloaded once) https://file.io/prtxa7X9bBg0

along with a sample generated with that model and voice to prove it worked Example_Generated_audio_audio_0_0.wav.zip

DrewThomasson commented 1 month ago

BETTER UPDATE!!

I Just updated VoxNovel's gui_run.py code to automatically rename your vocab file for you when it sees it incorrectly named.

So you Just go a git pull and you should be good to go :)

ScratchMode commented 1 month ago

Thanks again Drew, seems to working good again.