chrispla / mir_ref

A Representation Evaluation Framework for Music Information Retrieval tasks
32 stars 0 forks source link

Implementation of Mel Frequency Cepstral Coefficients (MFCC) #1

Open JoaoSartoreto opened 1 month ago

JoaoSartoreto commented 1 month ago

Description: I intend to work on implementing the MFCC for feature extraction from audio signals. The MFCC is a popular technique for extracting features from audio signals such as voice or music, and is widely used in applications like voice recognition and music classification.

Implementation Details:

  1. Libraries:

    • I plan to use the librosa library for the implementation of the MFCC as it provides an efficient and easy-to-use implementation of the MFCC.
    • The numpy library is fundamental for working with arrays and numerical calculations. I can use it to manipulate audio data and perform necessary mathematical operations for MFCC computation.
  2. Functions:

    • Implement a function to apply the pre-emphasis filter to the audio signal. This step increases the importance of higher frequencies before calculating MFCCs.
    • Create a function that computes MFCCs for each window of the audio signal. I will use the librosa.feature.mfcc function for this purpose.
    • Implement a function to test MFCCs on different types of audio signals. Compare the results with known labels (e.g., musical instruments) to assess their effectiveness.

Estimated Timeline:

chrispla commented 1 month ago

Hey João, thanks for the issue submission! Your plan sounds good. I haven't really seen pre-emphasis filters being used for music applications - do you have any examples from papers? Another thing to consider is delta features (see librosa.feature.delta). 1st and 2nd order deltas of MFCCs are often used in MIR research (you'd simply concatenate the array with the original MFCC with that of the 1st and 2nd order ones).

Keep in mind that the package is currently restricted to simply using a feature name, without the option to configure different options for a feature. It would be nice in the future to be able to call mfcc(pre-emphasis=True, deltas=True), but for now if you want to implement a couple different options for MFCC you'd have to use different feature names (e.g. mfcc, mfcc-deltas).

JoaoSartoreto commented 3 days ago

Hey Christos,

Thanks for the feedback on the issue submission! The idea of using pre-emphasis filters for music applications was suggested by a friend from college. However, I didn’t have a strong academic background on the topic and was trying to find some papers to support the idea. So far, I haven’t found any relevant examples in the literature.

Given this, I plan to follow the standard implementation approach. I’ll focus on using delta features (librosa.feature.delta) and concatenating the original MFCCs with the 1st and 2nd order ones, as you mentioned.

If you have any additional tips or suggestions, I’d love to hear them!

Thanks again!