Closed jodusan closed 4 years ago
Hi! I guess you should check first this wiki. You'll find a paragraph related to librosa here.
Still, there are couple of important nuances in librosa: 1) htk = true or false This parameter essentially defines the weights of mel-filterbank (HTK-style or Slaney-style). 2) centering In NWaves, like in many other frameworks, frames are not centered the way they are in librosa (in fact, I don't quite understand its purpose...), so this parameter must be set to False.
Let's just consider an example:
int sr = 22050; // sampling rate
int fftSize = 1024;
double lowFreq = 100; // if not specified, will be 0
double highFreq = 8000; // if not specified, will be samplingRate / 2
int filterbankSize = 40; // or 24 for htk=true (usually)
// if 'htk' parameter in librosa will be set to False:
var melBank1 = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr, lowFreq, highFreq);
// if 'htk' parameter in librosa will be set to True:
var melBands = FilterBanks.MelBands(filterbankSize, sr, lowFreq, highFreq);
var melBank2 = FilterBanks.Triangular(fftSize, sr, melBands, null, Scale.HerzToMel);
var opts = new MfccOptions
{
SamplingRate = sr,
FrameDuration = (double)fftSize / sr,
HopDuration = 0.010,
FeatureCount = 12,
Filterbank = melBank1, // or MelBank2
NonLinearity = NonLinearityType.ToDecibel, // mandatory
Window = WindowTypes.Hamming, // in librosa 'hann' is by default
LogFloor = 1e-10f, // mandatory
DctType="2N",
LifterSize = 0
};
var e = new MfccExtractor(opts);
In librosa:
mfccs = librosa.feature.mfcc(y, sr, n_mfcc=13,
dct_type=2, norm='ortho', window='hamming',
htk=False, n_mels=40, fmin=100, fmax=8000,
n_fft=1024, hop_length=int(0.010*sr), center=False)
Actually there are even more options. Feel free to ask, if you have any questions.
@ar1st0crat Thank you for the detailed response!
I'm trying to replicate what NWaves does in python librosa lib. I've seen comments around the code so I guess there have been comparisons previously. Do you have any hints on how to generate similar looking mfc in librosa as in NWaves? Any points on what to look out for?
Thanks