FFT compatible with OpenAI Whisper features

Currently, only radix-2 FFT algorithm is implemented. In fact, it's the first time when I see the usage of FFT of non-power-of-2 size in a DSP/AI framework. In general, it's possible to implement slower FFT/log-mel-filterbank with FFT of size 400, but it'll require some time. Meanwhile, the following extractor will produce the results as close to WhisperAI as possible (although, slightly different) - it uses FrameSize=400 and FFTSize=512 (like it's usually done, actually):

var samplingRate = 16000;

var bands = FilterBanks.MelBandsSlaney(80, samplingRate);
var filterbank = FilterBanks.MelBankSlaney(80, 512, samplingRate);

var options = new FilterbankOptions
{
                SamplingRate = samplingRate,
                FilterBank = filterbank,
                FrameSize = 400,
                HopSize = 160,
                Window = WindowType.Hann,
                SpectrumType = SpectrumType.Power,
                NonLinearity = NonLinearityType.Log10,
                LogFloor = 1e-10f,
};

var extractor = new FilterbankExtractor(options);
var vectors = extractor.ComputeFrom(signal);

After this you'll need to post-process vectors similarly to WhisperAI's code (two last steps):

log_spec = torch.maximum(log_spec, log_spec.max() - 8.0)
log_spec = (log_spec + 4.0) / 4.0

ar1st0crat / NWaves

FFT compatible with OpenAI Whisper features #70