Open edufonseca opened 7 years ago
I've created a separate issue concerning changes in MFCC values due to signal level #543. Normalized windowing will further contribute to this problem making mel energy values even smaller.
We might want to change normalized
to False by default.
@edufonseca Do you still have your scripts to evaluate accuracy difference when using normalized windows again? (As we lowered the threshold for silence in #543, may be the normalization is not a problem any more).
I also found that there were much differences of spectrum amptitude matrix between essentia and librosa.I doubt it`s of "Pading","StartFromZero".I will try to get the formant frequencies and trace the diffence of result.
@edufonseca Did you compare with any other apps eg. OpenSmile, etc.?
Eduardo Fonseca Music Technology Group Universitat Pompeu Fabra
--
On Tue, 24 Jul 2018 at 22:09, sildeag notifications@github.com wrote:
@edufonseca https://github.com/edufonseca Did you compare with any other apps eg. OpenSmile, etc.?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MTG/essentia/issues/525#issuecomment-407535584, or mute the thread https://github.com/notifications/unsubscribe-auth/ARVBBwy4jnJ3O3E1JjwsTQugDJNNlq0Aks5uJ38QgaJpZM4LC2VN .
One of the main differences with librosa is in the silence threshold. We have done some updates related to that in the mfcc_thresholding but it's not merged yet. You can try to compare with MFCCs computed using that branch.
A comparison was made of the MFCC computation between librosa and essentia, using data from DCASE challenge 2016, using their baseline system (MFCC+GMM), for Task 1 - Acoustic scene classification.
Procedure:
Run two simulations for Task 1 - Acoustic scene classification: with and without normalization. Report the difference of classification accuracy found between librosa and essentia-based systems:
Next plot shows the hamming window used in librosa and in essentia (Normalized = True). Note bottom of the plot.
Next two plots show mean and std of MFCCs computed over 1500 frames of the same audio file, for librosa and essentia. Up: with window normalization. Bottom: without window normalization
Comment: This occurs for this particular scenario, audio content (soundscapes) and classifier (GMM). Would something similar happen in a different scenario?