Sample rate of reference audio for cloning

regstuff commented 10 months ago

Hi, Tried the colab link to clone a voice with a wav file. Wasnt able to get things to work with 48kHz, 16kHz or 8kHz sample rate files. Any clues as to what the actual format should be? This is the error I get:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-7-9105142690b8>](https://localhost:8080/#) in <cell line: 9>()
      7 ref_clips = glob.glob(path)
      8 
----> 9 audio,sr = infer_tts(text,ref_clips,diffuser_en,diff_model_en,ts_model_en,vocoder_en)
     10 
     11 write('/content/test.wav',sr,audio)

3 frames
[/usr/local/lib/python3.10/dist-packages/maha_tts/utils/stft.py](https://localhost:8080/#) in transform(self, input_data)
     50 
     51         # similar to librosa, reflect-pad the input
---> 52         input_data = input_data.view(num_batches, 1, num_samples)
     53         input_data = F.pad(
     54             input_data.unsqueeze(1),

RuntimeError: shape '[1, 1, 137686]' is invalid for input of size 275372

rasenganai commented 10 months ago

sampling rate should be 22050

sanjay-906 commented 2 months ago

sampling rate should be 22050

my audio files had 22050 sampling rate, and I still got th error, it got fixed when i converted my audio(channels) from stereo to mono

dubverse-ai / MahaTTS

Sample rate of reference audio for cloning #10