152334H / tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)
GNU Affero General Public License v3.0
759 stars 177 forks source link

UI Downloaded Clips Cant Be Opened & Results are overwritten when generating #77

Open Acephalia opened 1 year ago

Acephalia commented 1 year ago

Hello, firstly thanks for this amazing project.

I am having a bit of an issue where generated outputs keep overwriting each other. I have tried to circumvent this by downloading the clips via the webui however the files that are downloaded cannot be opened and give a corrupted error.

Any guidance on the matter would reall be appreciated.

Thank you.

Acephalia commented 1 year ago

Made a fork and fixed the download + added some additional settings to GUI. Get it here

Still need to fix the overwriting issue. If anyone can help please let me know!

TiptopFunk commented 1 year ago

thanks for this, i did the something similar with the help of chatgpt, to exposed few parameters like Temperature, Length Penalty .
Nice touch to make the Ui better, buti can't seem to find where to load custom trained model.

I also tried to solve below issue with chatgpt, but no success, haha

  1. Display the random seed used to generated beside the clip so we can reproduce the same results.
  2. As you said fix the file overwriting, which is really annoying.
Acephalia commented 1 year ago

thanks for this, i did the something similar with the help of chatgpt, to exposed few parameters like Temperature, Length Penalty . Nice touch to make the Ui better, buti can't seem to find where to load custom trained model.

I also tried to solve below issue with chatgpt, but no success, haha

  1. Display the random seed used to generated beside the clip so we can reproduce the same results.
  2. As you said fix the file overwriting, which is really annoying.

I did fix the random seed and it now displays when generating. Close to having a preset save and recall as well. TBH I'm not hating the fact that it overwrites (so I haven't looked into it yet) because the drive gets clogged up pretty fast. With the multiple downloads fixed I can just save the ones I want only now.

TiptopFunk commented 1 year ago

I modified your code to use custom trained checkpoint, so i can Select GPT Checkpoint, now everything working great. thanks so much.

Acephalia commented 1 year ago

I modified your code to use custom trained checkpoint, so i can Select GPT Checkpoint, now everything working great. thanks so much.

@TiptopFunk that’s great! Can you send me the code so I can have a look and implement? Would be good to have the fine tuning models loading in there as well.

The CVVP is working fine in my fork. I was having trouble with the mrq version. Did you manage to fix that?

Cheers!

TiptopFunk commented 1 year ago

yes, i don't coding at all, i have learned python for two days couple of years ago. https://pastebin.com/T1PQdjaE check line 187-194 and line 283. i just add this, it will download the cvvp mode. then it just works.
I used your repo, it also works, maybe it has something to do with dependency. by the way, i'm on fedora linux with an AMD GPU.

TiptopFunk commented 1 year ago

Generating autoregressive samples.. 100%|█████████████████████████████████████████████| 5/5 [00:13<00:00, 2.76s/it] Computing best candidates using CLVP 40% and CVVP 60% 100%|█████████████████████████████████████████████| 5/5 [00:00<00:00, 8.06it/s] Transforming autoregressive outputs into audio.. 100%|███████████████████████████████████████████| 30/30 [00:02<00:00, 12.98it/s] Generating 1 candidates for voice angie (seed=141676669) took 26.36 seconds

Acephalia commented 1 year ago

yes, i don't coding at all, i have learned python for two days couple of years ago. https://pastebin.com/T1PQdjaE check line 187-194 and line 283. i just add this, it will download the cvvp mode. then it just works. I used your repo, it also works, maybe it has something to do with dependency. by the way, i'm on fedora linux with an AMD GPU.

Hey if you are learning something new and fixing things yourself thats all that matter.

The error is because the load is getting split across the CPU and GPU if I'm not mistaken. Thanks for the file I will have a look at it. I wasn't just sure how to fix it on the mrq version.

Did you also have a link for the mods you made to include the checkpoints? Thank you!

TiptopFunk commented 1 year ago

Yes, Here is my conda env for fast tortoise for your reference. https://pastebin.com/WULQu9Rq

Here is the link of the modifed app.py from your fork. https://pastebin.com/nZ1mtvNz

  1. add code to line 46-54,
  2. add# to line 331, 332, 342,343,
  3. add code t0 line 340, 341
TiptopFunk commented 1 year ago

Thanks for your kind word, this because i'm on a intel mac with AMD 6800xt, all this AI tool like stable diffusion and LLM doesn't preform well on mac, so i have to dual boot Linux, reading through lots of posts and documents. it's quiet a journey.

TiptopFunk commented 1 year ago

so i checked mrq version, i don't know if cvvp works. here is the output from command line:

[1/1] Generating line: [I am really happy,] Are you looking to enhance your coding skills? Loading voice: random with model a325ac0a Requesting weighing against CVVP weight, but voice latents are missing some extra data. Please regenerate your voice latents with 'Slimmer voice latents' unchecked. Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 25.850035190582275 seconds /home/xxx/miniconda3/envs/fasttortoise/lib/python3.10/site-packages/torchaudio/functional/functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. warnings.warn( Loading Voicefixer Loaded Voicefixer Generation took 33.5921847820282 seconds, saved to './results//random//random_00001_fixed.wav'

Acephalia commented 1 year ago

so i checked mrq version, i don't know if cvvp works. here is the output from command line:

[1/1] Generating line: [I am really happy,] Are you looking to enhance your coding skills? Loading voice: random with model a325ac0a Requesting weighing against CVVP weight, but voice latents are missing some extra data. Please regenerate your voice latents with 'Slimmer voice latents' unchecked. Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 25.850035190582275 seconds /home/xxx/miniconda3/envs/fasttortoise/lib/python3.10/site-packages/torchaudio/functional/functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. warnings.warn( Loading Voicefixer Loaded Voicefixer Generation took 33.5921847820282 seconds, saved to './results//random//random_00001_fixed.wav'

hmm. I'm away from comp tonight. Will check tomorrow and report back.

On another note have you actually managed to get any of the prompting like : [I am really happy,] to actually work?

TiptopFunk commented 1 year ago

yeah, i got the same error now:

Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

The emotion prompting works on tortoise-tts-fast, just the difference is not that noticeable, i have used audacity to compare this with same seed and setting, with or without emotion prompting.