I want to train a model that uses MFCCs as the input. Luckily CoreMLTools is able to convert torchaudio's MFCC :) but there are some numerical differences. I assume those differences arise because CoreML/torchaudio use different parameters (such as FFT size, the number of mel frequencies and so on). I think it is mainly influenced by higher frequencies (reducing torchaudio's n_mels decreases the difference a bit).
What are the recommended parameters to minimize the discprenacy between CoreML/torchaudio's MFCC?
🐞Describing the bug
I want to train a model that uses MFCCs as the input. Luckily CoreMLTools is able to convert torchaudio's MFCC :) but there are some numerical differences. I assume those differences arise because CoreML/torchaudio use different parameters (such as FFT size, the number of mel frequencies and so on). I think it is mainly influenced by higher frequencies (reducing torchaudio's n_mels decreases the difference a bit).
What are the recommended parameters to minimize the discprenacy between CoreML/torchaudio's MFCC?
To Reproduce
System environment (please complete the following information):