JarodMica / audiobook_maker

GNU General Public License v3.0
192 stars 33 forks source link

Program generates one sentence and then crashes, doesn't save progress #20

Closed Lilliva closed 9 months ago

Lilliva commented 9 months ago

Here is what pops up in the command window after it generates the first sentence and saves the output in the results folder:

C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\runtime\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Traceback (most recent call last): File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\audio_book_app_2_0.py", line 46, in run self.function(self.directory_path, self.report_progress) File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\audio_book_app_2_0.py", line 753, in generate_audio_for_sentence_threaded audio_path = self.generate_audio(sentence) File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\audio_book_app_2_0.py", line 797, in generate_audio audio_path = rvc_convert(model_path=voice_model_path, File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\runtime\lib\site-packages\rvc_pipe\rvc_infer.py", line 121, in rvc_convert vc.get_vc(model_path) File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\rvc\infer\modules\vc\modules.py", line 110, in get_vc self.tgt_sr = self.cpt["config"][-1] KeyError: 'config'

I've tried looking around in modules.py where the KeyError: 'config' is coming from, but I'm still a bit new to coding and I haven't noticed what's wrong. I thought it was talking about the config.json file from the RVC logs, but I'm not sure where I need to put it or if it's related at all. I've also looked into _utils.py addressed near the top to see if that was part of the problem, but I couldn't find any "tensor.storage()" to replace.

Probably important to note is that even though the first line is generated and saved, the progress isn't. When I try to continue generation upon next launch, it generates the first line again, then crashes.

Any recommendations of what I should do from here? Thank you.

JarodMica commented 9 months ago

KeyError usually comes from model incompatibilities or something wrong with the voice model getting passed in. I would double check it's an RVC v2/V1 model, but if it's the one I provided (azasu.pth), it should be fine.

Inside of the audiobooks folder, do you see any audio files inside of the book?

Lilliva commented 9 months ago

No, there isn't. I'm using models that I trained, so does that mean I need to retrain them as V1?

JarodMica commented 9 months ago

If it's your models, make sure it's either a V1 or V2 RVC model that you're putting inside of the voice_models folder. Now, if you have already retrained them, then it's going to be a different issue.

If you're training a voice in tortoise, that stays in tortoise.

DanielTheurich commented 9 months ago

I am having the exact same issue as Lilliva.

I was able to set up and run both Tortoise TTS and RVC with training my own models by following Jarod's tutorials. (Thanks for making those!) However, I'm not sure how to get around this issue when trying to use the Audiobook maker. The model I copied into the audiobook_maker > voice_models file was named "D_1550.pth" (it is about 817 MB). I got this model from the RVC1006Nvidia>logs>"Model Name" folder.

The error I occur when trying to run is the same as what Lilliva had. KeyError: 'config'

I'm just wondering if maybe I am using the wrong .pth file? Or if the file is too big?

(Note: I know that both of my tts and rvc models work since I was able to run the tts and then run that audio through rvc to make it sound better.)

Lilliva commented 9 months ago

Okay, I tried using a .pth file from the rvc weights folder. I can't tell if I solved my initial issue because a new error comes up:

C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\runtime\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() 2023-10-19 22:38:20 | WARNING | rvc.infer.modules.vc.modules | Traceback (most recent call last): File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\rvc\infer\lib\audio.py", line 54, in load_audio with open(file, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: './results//IvyEngage//IvyEngage_00000.wav'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\rvc\infer\modules\vc\modules.py", line 171, in vc_single audio = load_audio(input_audio_path, 16000) File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\rvc\infer\lib\audio.py", line 66, in load_audio raise RuntimeError(f"Failed to load audio: {e}") RuntimeError: Failed to load audio: [Errno 2] No such file or directory: './results//IvyEngage//IvyEngage_00000.wav'

Traceback (most recent call last): File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\audio_book_app_2_0.py", line 46, in run self.function(self.directory_path, self.report_progress) File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\audio_book_app_2_0.py", line 753, in generate_audio_for_sentence_threaded audio_path = self.generate_audio(sentence) File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\audio_book_app_2_0.py", line 797, in generate_audio audio_path = rvc_convert(model_path=voice_model_path, File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\runtime\lib\site-packages\rvc_pipe\rvc_infer.py", line 124, in rvc_convert wavfile.write(output_file_path, tgt_sr, audio_opt) File "C:\Users\name\OneDrive\Documents\audiobook_maker_v1.0\runtime\lib\site-packages\scipy\io\wavfile.py", line 772, in write dkind = data.dtype.kind AttributeError: 'NoneType' object has no attribute 'dtype'

From what I can understand, it is generating a sentence like before, but now it can't find the file to continue the rest of what it is supposed to do.

I know the file exists and is playable because I looked at the corresponding results folder and found it there, saying the sentence it was supposed to say. I'm guessing the latter half of the error is resulting from the program not being able to find the file. The only program that has a "results" folder is tts, so that's where I checked. I noticed that the audiobook maker has an output folder, and every time I try to retry generation, there is a new .wav file in this output folder that can't be played because it is either corrupted or unsupportive. I'm not sure if this is related or not.

Should I redirect the results folder to somewhere the audiobook maker program can find? If so, where should that be? If not, what should I try from here? Thanks for your time

JarodMica commented 9 months ago

Hey @DanielTheurich, make sure your grabbing the pth file from assets/weights, that is the one you want to grab and should be named after your experiment name that you specified in RVC. This will resolve your issue.

@Lilliva make sure you change the results path in tortoise to an absolute path as shown in the vid at around 6:30 of the audiobook maker vid, you've still got the relative path in there as it's showing here: './results//IvyEngage//IvyEngage_00000.wav'. Once you fix that, it should be good to go

Lilliva commented 9 months ago

Okay! That fixes this issue!

I think TTS is working fine, but the audio files produced in the audiobook folder are just white noise and each 3-4 hours long. Since I don't think this related to my first problem, I'll tackle it later after some rest and ask for help if I've gotten nowhere.

I'll leave this open just in case Daniel gets a different issue.

Thank you so much!

DanielTheurich commented 9 months ago

Thanks @JarodMica. I was just using the wrong file. Now when I use the one from assets/weights it works. Everything works smoothly now. Thanks again!

JarodMica commented 9 months ago

Awesome, issue is resolved, closing

athos54 commented 9 months ago

Hi there, good job :)

I have a similar problem. I have trained a model as you show on this video https://www.youtube.com/watch?v=6sTsqSQYIzs&t=1140s

When train is finish, Im be able to use on ai-voice-cloning

image

image

I try to copy the model from ai-voice-cloning to audiobook_maker

image

and then I have this error

2023-10-26 18:47:42 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce RTX 3090
Calling API with sentence: <The story we are about to embark upon takes us into a world of unlimited possibilities and borderless dreams, where a child born in Albuquerque, New Mexico, in 1964, was destined to reshape the globe.>
API response received with audio path: E:\ai-voice-cloning\results\random/samantha//samantha_00030.wav
E:\audiobook_maker-1.0\venv\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Traceback (most recent call last):
  File "E:\audiobook_maker-1.0\audio_book_app_2_0.py", line 29, in run
    self.function(self.directory_path, self.report_progress)
  File "E:\audiobook_maker-1.0\audio_book_app_2_0.py", line 644, in generate_audio_for_sentence_threaded
    audio_path = self.generate_audio(sentence)
  File "E:\audiobook_maker-1.0\audio_book_app_2_0.py", line 688, in generate_audio
    audio_path = rvc_convert(model_path=voice_model_path,
  File "E:\audiobook_maker-1.0\venv\lib\site-packages\rvc_pipe\rvc_infer.py", line 121, in rvc_convert
    vc.get_vc(model_path)
  File "E:\audiobook_maker-1.0\rvc\infer\modules\vc\modules.py", line 110, in get_vc
    self.tgt_sr = self.cpt["config"][-1]
KeyError: 'config'

I think I have the absolute path set correctly because with other model the audobook_maker works correctly

JarodMica commented 9 months ago

Hi there, good job :)

I have a similar problem. I have trained a model as you show on this video https://www.youtube.com/watch?v=6sTsqSQYIzs&t=1140s

When train is finish, Im be able to use on ai-voice-cloning

image

image

I try to copy the model from ai-voice-cloning to audiobook_maker

image

and then I have this error

2023-10-26 18:47:42 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce RTX 3090
Calling API with sentence: <The story we are about to embark upon takes us into a world of unlimited possibilities and borderless dreams, where a child born in Albuquerque, New Mexico, in 1964, was destined to reshape the globe.>
API response received with audio path: E:\ai-voice-cloning\results\random/samantha//samantha_00030.wav
E:\audiobook_maker-1.0\venv\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Traceback (most recent call last):
  File "E:\audiobook_maker-1.0\audio_book_app_2_0.py", line 29, in run
    self.function(self.directory_path, self.report_progress)
  File "E:\audiobook_maker-1.0\audio_book_app_2_0.py", line 644, in generate_audio_for_sentence_threaded
    audio_path = self.generate_audio(sentence)
  File "E:\audiobook_maker-1.0\audio_book_app_2_0.py", line 688, in generate_audio
    audio_path = rvc_convert(model_path=voice_model_path,
  File "E:\audiobook_maker-1.0\venv\lib\site-packages\rvc_pipe\rvc_infer.py", line 121, in rvc_convert
    vc.get_vc(model_path)
  File "E:\audiobook_maker-1.0\rvc\infer\modules\vc\modules.py", line 110, in get_vc
    self.tgt_sr = self.cpt["config"][-1]
KeyError: 'config'

I think I have the absolute path set correctly because with other model the audobook_maker works correctly

Hey Athos, the voice models that go into the audiobook maker are RVC voices, not the tortoise ones. You need to either train an rvc voice or obtain one from online and then put them there. The tortoise ones you leave in tortoise.

athos54 commented 9 months ago

@JarodMica thanks for the reply, then, I think I dont understand very well how it works, I dont know if you have some video explaining, the diferences.

Then, I need to train a model RVC on one side, and a model for tortoise? two diferents models?

If I want read a spanish text, how can I do it?

JarodMica commented 9 months ago

@JarodMica thanks for the reply, then, I think I dont understand very well how it works, I dont know if you have some video explaining, the diferences.

Then, I need to train a model RVC on one side, and a model for tortoise? two diferents models?

If I want read a spanish text, how can I do it?

I explain a bit in the video, but I'll also be brief here; your understanding is correct. They currently function as two separate systems: Tortoise TTS and Audiobook Maker

The audiobook maker makes a call to Tortoise to generate audio for text, and after that, returns just the audio file back to the audiobook maker. With that audio file, the audiobook maker uses RVC models to convert that audio text into a specific voice and that is the final output you see.

Tortoise models are incompatible with RVC models as they are completely different architectures so I have to combine them in series. Due to this, you train a tortoise model, and then an RVC model and there's no combined solution for training them using the same model. Each architecture needs to be trained on their own unfortunately as of right now.

Additionally, think of using tortoise as matching prosody and RVC as matching voices tone.

If you can train a Spanish model in tortoise, you can select that model to be used. That way you can make calls to tortoise with your Spanish model. But how you train other languages, you'll need your own tokenizer for that.

athos54 commented 9 months ago

Thanks, I think I understand a little bit more now. Im going to make some test, thanks again :)