dubverse-ai / MahaTTS

Apache License 2.0
253 stars 17 forks source link

Sample rate of reference audio for cloning #10

Closed regstuff closed 9 months ago

regstuff commented 10 months ago

Hi, Tried the colab link to clone a voice with a wav file. Wasnt able to get things to work with 48kHz, 16kHz or 8kHz sample rate files. Any clues as to what the actual format should be? This is the error I get:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-7-9105142690b8>](https://localhost:8080/#) in <cell line: 9>()
      7 ref_clips = glob.glob(path)
      8 
----> 9 audio,sr = infer_tts(text,ref_clips,diffuser_en,diff_model_en,ts_model_en,vocoder_en)
     10 
     11 write('/content/test.wav',sr,audio)

3 frames
[/usr/local/lib/python3.10/dist-packages/maha_tts/utils/stft.py](https://localhost:8080/#) in transform(self, input_data)
     50 
     51         # similar to librosa, reflect-pad the input
---> 52         input_data = input_data.view(num_batches, 1, num_samples)
     53         input_data = F.pad(
     54             input_data.unsqueeze(1),

RuntimeError: shape '[1, 1, 137686]' is invalid for input of size 275372
rasenganai commented 10 months ago

sampling rate should be 22050

sanjay-906 commented 2 months ago

sampling rate should be 22050

my audio files had 22050 sampling rate, and I still got th error, it got fixed when i converted my audio(channels) from stereo to mono