Hi,
Tried the colab link to clone a voice with a wav file. Wasnt able to get things to work with 48kHz, 16kHz or 8kHz sample rate files. Any clues as to what the actual format should be?
This is the error I get:
RuntimeError Traceback (most recent call last)
[<ipython-input-7-9105142690b8>](https://localhost:8080/#) in <cell line: 9>()
7 ref_clips = glob.glob(path)
8
----> 9 audio,sr = infer_tts(text,ref_clips,diffuser_en,diff_model_en,ts_model_en,vocoder_en)
10
11 write('/content/test.wav',sr,audio)
3 frames
[/usr/local/lib/python3.10/dist-packages/maha_tts/utils/stft.py](https://localhost:8080/#) in transform(self, input_data)
50
51 # similar to librosa, reflect-pad the input
---> 52 input_data = input_data.view(num_batches, 1, num_samples)
53 input_data = F.pad(
54 input_data.unsqueeze(1),
RuntimeError: shape '[1, 1, 137686]' is invalid for input of size 275372
Hi, Tried the colab link to clone a voice with a wav file. Wasnt able to get things to work with 48kHz, 16kHz or 8kHz sample rate files. Any clues as to what the actual format should be? This is the error I get: