152334H / tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)
GNU Affero General Public License v3.0
779 stars 179 forks source link

How do i use the finetuned model(.pth) inside colab notebook? #62

Open pheonis2 opened 1 year ago

pheonis2 commented 1 year ago

I need to loab the .pth file to fast tortoise repo's colab notebook. Yes, im using colab notebook because i don't have a beafy gpu machine.

I observed, This code "from tortoise.api import TextToSpeech from tortoise.utils.audio import load_audio, load_voice, load_voices" ..downloads the default autoregressive.pth and other .pth files to the root/cache/models/ folder inside colab..

Now, i want to use the model i have finetuned in colab notebook. how do i do that?

Do i need to replace the autoregressive.pth file inside the root/cache/models/ folder with my finetuned .pth file and rename it to autoregressive.pth?

Is there any other way to call the .pth file from any other preferred locations like for example "content/drive/MyDrive/ai-voice-cloning/models/finetuned.pth?

pheonis2 commented 1 year ago

Found this command and executed in colab

!./scripts/tortoise_tts.py --ar_checkpoint /drive/MyDrive/ai-voice-cloning/finetuned_gpt.pth

it's working ..but very slow..i mean 2 hours passed almost... and this is the progress so far..

Downloading the weight of neural vocoder: TFGAN
Weights downloaded in: /root/.cache/voicefixer/synthesis_module/44100/model.ckpt-1490000_trimed.pt Size: 135613039
Downloading the main structure of voicefixer
Weights downloaded in: /root/.cache/voicefixer/analysis_module/checkpoints/vf.ckpt Size: 489307071
reading text from stdin!

And stuck at there...

eloop001 commented 1 year ago

The download speed for voicefixer is for some reason very low.

HobisPL commented 1 year ago

@pheonis2 You can just use this:

tts = TextToSpeech(ar_checkpoint='Here is the path to the model.') tts = TextToSpeech(ar_checkpoint='/content/tortoise-tts-fast/27550_gpt.pth')

import torch
import torchaudio
import torch.nn as nn
import torch.nn.functional as F

import IPython

from tortoise.api import TextToSpeech
from tortoise.utils.audio import load_audio, load_voice, load_voices

tts = TextToSpeech(ar_checkpoint='/content/tortoise-tts-fast/27550_gpt.pth')

and next

text = "Here, provide the text."

preset = "ultra_fast" 
voice = 'custom_voice' 
voice_samples, conditioning_latents = load_voice(voice)
gen = tts.tts_with_preset(text,
                          voice_samples=voice_samples, 
                          conditioning_latents=conditioning_latents, 
                          preset=preset)
torchaudio.save('generated.wav', gen.squeeze(0).cpu(), 24000)
IPython.display.Audio('generated.wav')
vienduong88 commented 1 year ago

@pheonis2 You can just use this:

tts = TextToSpeech(ar_checkpoint='Here is the path to the model.') tts = TextToSpeech(ar_checkpoint='/content/tortoise-tts-fast/27550_gpt.pth')

import torch
import torchaudio
import torch.nn as nn
import torch.nn.functional as F

import IPython

from tortoise.api import TextToSpeech
from tortoise.utils.audio import load_audio, load_voice, load_voices

tts = TextToSpeech(ar_checkpoint='/content/tortoise-tts-fast/27550_gpt.pth')

and next

text = "Here, provide the text."

preset = "ultra_fast" 
voice = 'custom_voice' 
voice_samples, conditioning_latents = load_voice(voice)
gen = tts.tts_with_preset(text,
                          voice_samples=voice_samples, 
                          conditioning_latents=conditioning_latents, 
                          preset=preset)
torchaudio.save('generated.wav', gen.squeeze(0).cpu(), 24000)
IPython.display.Audio('generated.wav')

Hi, can I ask how much epoch/inter that you start to see the difference when inference with your finetune model? I'm just started on my language model (Vietnamese) and when I try infrence by Tortoise I haven't see any difference yet, so I'm not sure if I inference by my model at all. I train with my language tokenizer created & also used it in inference along basic_cleaners. Would appreciate your help. Thanks! :)

enekochan commented 1 year ago

@pheonis2 look at this message:

reading text from stdin!

Its wainting for text from the standard input to speech it. It's doing nothing during those 2 hours. You can send text to stdin like this:

!./scripts/tortoise_tts.py --ar_checkpoint /drive/MyDrive/ai-voice-cloning/finetuned_gpt.pth <<< $(echo "This is a test.")
maepopi commented 1 year ago

Hello ! I tried what you proposed @vienduong88, but Jupyter throws an error saying that ar_checkpoint is an unexpected argument. What am I doing wrong?

Thank you very much

EDIT : it worked after I reinstalled everything, I suppose I still had the former tortoise module installed