comparison of MFCC computation between librosa and essentia, for acoustic scene classification

edufonseca commented 7 years ago

A comparison was made of the MFCC computation between librosa and essentia, using data from DCASE challenge 2016, using their baseline system (MFCC+GMM), for Task 1 - Acoustic scene classification.

Procedure:

Match common input parameters in both libraries and use same signal framing
Edited minor differences in librosa (20log10 and truncation of lowest amplitude values ) such that same amplitude treatment is used by both libraries
Did not look into the filterbank (in theory, both based on Slaney’s)
Specific essentia params in Windowing algorithm: disable zero phase windowing, and leaving normalization as True (by default). Hence, the window normalization appears to be the only major difference between both computations, at least to the best of my knowledge.

Run two simulations for Task 1 - Acoustic scene classification: with and without normalization. Report the difference of classification accuracy found between librosa and essentia-based systems:

Normalized = True -> accuracy difference ~ 6 % (librosa based system performs better)
Normalized = False -> accuracy difference ~ +-0.3 %

Next plot shows the hamming window used in librosa and in essentia (Normalized = True). Note bottom of the plot. hamming_ess_zpwoff_normon

Next two plots show mean and std of MFCCs computed over 1500 frames of the same audio file, for librosa and essentia. Up: with window normalization. Bottom: without window normalization mfcc_file7_small

mfcc_file7_small_nfalse

Comment: This occurs for this particular scenario, audio content (soundscapes) and classifier (GMM). Would something similar happen in a different scenario?

dbogdanov commented 7 years ago

I've created a separate issue concerning changes in MFCC values due to signal level #543. Normalized windowing will further contribute to this problem making mel energy values even smaller.

dbogdanov commented 7 years ago

We might want to change normalized to False by default.

dbogdanov commented 6 years ago

@edufonseca Do you still have your scripts to evaluate accuracy difference when using normalized windows again? (As we lowered the threshold for silence in #543, may be the normalization is not a problem any more).

ChenJunHero commented 6 years ago

I also found that there were much differences of spectrum amptitude matrix between essentia and librosa.I doubt it`s of "Pading","StartFromZero".I will try to get the formant frequencies and trace the diffence of result.

sildeag commented 6 years ago

@edufonseca Did you compare with any other apps eg. OpenSmile, etc.?

edufonseca commented 6 years ago

No. Only with librosa. I think there is good chance that the differences between librosa and essentia have been mitigated in later Essentia versions.

Eduardo Fonseca Music Technology Group Universitat Pompeu Fabra

--

On Tue, 24 Jul 2018 at 22:09, sildeag notifications@github.com wrote:

@edufonseca https://github.com/edufonseca Did you compare with any other apps eg. OpenSmile, etc.?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MTG/essentia/issues/525#issuecomment-407535584, or mute the thread https://github.com/notifications/unsubscribe-auth/ARVBBwy4jnJ3O3E1JjwsTQugDJNNlq0Aks5uJ38QgaJpZM4LC2VN .

dbogdanov commented 6 years ago

One of the main differences with librosa is in the silence threshold. We have done some updates related to that in the mfcc_thresholding but it's not merged yet. You can try to compare with MFCCs computed using that branch.

MTG / essentia

comparison of MFCC computation between librosa and essentia, for acoustic scene classification #525

No. Only with librosa. I think there is good chance that the differences between librosa and essentia have been mitigated in later Essentia versions.