Closed qmeeus closed 10 months ago
@qmeeus thanks for the feedback! Please see #12. WaveformToFbankConverter
now accepts both float and integer sample rates. Note though that float sample rates are legitimate and there are use cases where they are used. In fact Kaldi (which torchaudio internally uses) accepts only floats as sample rate. Hope #12 resolves your issue though. I plan to release it as part of v0.1.1 tomorrow morning.
@cbalioglu Thank you for addressing this, and for your explanation !
In
src/fairseq2/data/audio.py
AudioDecoderOutput
,WaveformToFbankInput
andWaveformToFbankOutput
, thesample_rate
is defined as float and not as an integer.I think this might be an error, but git history shows that it used to be an integer. In every other library I know (espnet, fairseq, soundfile, librosa, torchaudio, etc.), the sample rate is assumed to be an integer, as it should be since it is the number of frames per second, which cannot be non integer.
Here is an example of a problem that could occur (and that I have personally experienced):