MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.83k stars 530 forks source link

add support for time-domain filter for pre-emphasis of high-frequencies for MFCCs #656

Open georgid opened 6 years ago

georgid commented 6 years ago

One essential difference between htk's variant of MFCC and other implementations is the preemphasis of high frequencies. This is done by means of a IIR filter (as far as I understand) as explained in chapter 5.4 of the htk book

I tried to use the IIR of essentia to reproduce the MFCCs:

        import essentia.standard as ess
        PREEMPH = 0.97: #  PREEMCOEF = 0.97 in htk
        preemph_filter = ess.IIR(numerator=[1-PREEMPH])

        # startFromZero = True, validFrameThresholdRatio = 1 : the way htk computes windows
        for frame in ess.FrameGenerator(audio, frameSize = frameSize, hopSize = hopSize , startFromZero = True, validFrameThresholdRatio = 1):
                frame_doubled_first = np.insert(frame,0,frame[0])  
                preemph_frame = preemph_filter(frame_doubled_first)
                frame = preemph_frame[1:]

But the resulting MFCCs are not the same with the ones from htk. One can use this repo that has audio examples and htk-extracted mfccs. Once this is solved, the code snippet should be added to the full example

bmcfee commented 5 years ago

Jumping in here: it's likely that the differences you're seeing are due to HTK applying pre-emphasis independently in each frame (htk book page 59, right after eq 5.1), rather than IIR-filtering the entire signal prior to framing as you do here.

That seems to me like a strange choice on their part, and your implementation makes more sense to me.