Closed qmeeus closed 10 months ago
Interesting. Have you perhaps tried directly giving the path to the wav
file to the predict
method? Also, you can resample like this as well:
resample_rate = 16000
waveform, sample_rate = torchaudio.load('./source.wav')
resampler = torchaudio.transforms.Resample(sample_rate, resample_rate, dtype=waveform.dtype)
That is not really the point of the issue. What if I want to process the waveform before passing it to the model? The real solution to make the example run is to provide a float value to predict, something around those lines:
translated_text, *_ = translator.predict(waveform, "s2st", "fra", sample_rate=float(sample_rate))
but this does not address the underlying problem.
Thank you for the suggestion
(Copy pasting from fairseq2 issue)
@qmeeus thanks for the feedback! Please see #12. WaveformToFbankConverter
now accepts both float and integer sample rates. Note though that float sample rates are legitimate and there are use cases where they are used. In fact Kaldi (which torchaudio internally uses) accepts only floats as sample rate. Hope #12 resolves your issue though. I plan to release it as part of v0.1.1 tomorrow morning.
In
src/fairseq2/data/audio.py
AudioDecoderOutput
,WaveformToFbankInput
andWaveformToFbankOutput
, thesample_rate
is defined as float and not as an integer.This leads to an error when executing this code sample:
Although I think this is a mistake in fairseq2 code base, I report it here as well since fairseq2 is listed as a dependency. The fix, should they decide not to fix it, is to cast the sample_rate as a float in this line
I have reported the issue in fairseq2 repo as well