Difference between kaldifeat mfcc feature and torchaudio mfcc feature

binhtranmcs commented 8 months ago

Currently, I am using torchaudio.transforms.MFCC to compute features. Now I need to use C++ API of kaldifeat. But I see that the results of the extracted features are different. Here is a script I used:

import kaldifeat
import torchaudio
import torch

torch.manual_seed(0)
torch.set_printoptions(precision=3, sci_mode=False)

wave = torch.rand(1, 400)

# torchaudio mfcc
transform = torchaudio.transforms.MFCC(
    sample_rate=16000,
    n_mfcc=13,
    melkwargs={"n_fft": 400, "hop_length": 160, "n_mels": 23, "center": False, "window_fn": torch.hann_window},
)
ta_mfcc = transform(wave)[0].transpose(0, 1)

# kaldi compliance mfcc
kaldi_mfcc = torchaudio.compliance.kaldi.mfcc(
    wave * 2**15,
    num_ceps = 13,
    num_mel_bins = 23,
    use_energy = False,
    window_type="hanning")

# kaldifeat mfcc
opts_mfcc = kaldifeat.MfccOptions()
opts_mfcc.use_energy = False
opts_mfcc.num_ceps = 13
opts_mfcc.frame_opts.window_type = "hanning"
opts_mfcc.frame_opts.dither = 0
opts_mfcc.mel_opts.num_bins = 23
mfcc = kaldifeat.Mfcc(opts_mfcc)
kaldifeat_mfcc = mfcc(wave[0] * 2**15)

ft = torch.cat([ta_mfcc, kaldi_mfcc, kaldifeat_mfcc]).transpose(0, 1)

print(ft)

The result is:

tensor([[ 92.246, 115.379, 115.379],
        [-10.815, -34.377, -34.377],
        [  2.703, -11.685, -11.685],
        [  0.333, -15.649, -15.649],
        [  4.773,  -7.279,  -7.280],
        [  1.226, -13.743, -13.743],
        [  2.976, -10.609, -10.609],
        [  6.198,  -2.479,  -2.479],
        [  4.769,  -4.193,  -4.193],
        [  5.665,  -0.910,  -0.910],
        [  5.217,  -0.147,  -0.147],
        [  4.096,  -2.355,  -2.355],
        [  5.315,   1.021,   1.021]])

The result from torchaudio.compliance.kaldi.mfcc is the same as that of kaldifeat, but different from torchaudio.transforms.MFCC.

Is there a way to configure kaldifeat so that the result is the same as that of torchaudio.transforms.MFCC. Thanks in advance.

csukuangfj commented 8 months ago

ta_mfcc = transform(wave)[0].transpose(0, 1)

Is there a reason to not use wave * 32768?

binhtranmcs commented 8 months ago

Is there a reason to not use wave * 32768?

I think torchaudio receives input in the range [-1,1]. But with wave * 32768 the results are still different.

csukuangfj commented 8 months ago

kaldi_mfcc = torchaudio.compliance.kaldi.mfcc(wave * 2**15, window_type="hanning")

# kaldifeat mfcc
opts_mfcc = kaldifeat.MfccOptions()
opts_mfcc.use_energy = False
opts_mfcc.frame_opts.window_type = "hanning"

Is there a reason to not use the same parameters for torchaudio.transforms.MFCC? For instance, you use hanning for both kaldifeat and torchaudio.compliance.kaldi.mfcc, but you leave torchaudio.transforms.MFCC to use its default value, though I am not sure whether its default value is hanning or not.

Also, you are using n_mfcc=13, for torchaudio.transforms.MFCC. Is there any reason to not use the same value for kaldifeat and torchaudio.compliance.kaldi.mfcc?

If you want to produce the same features for the same input, please ensure

you indeed use the same input
you indeed use the same arguments

binhtranmcs commented 8 months ago

For instance, you use hanning for both kaldifeat and torchaudio.compliance.kaldi.mfcc, but you leave torchaudio.transforms.MFCC to use its default value, though I am not sure whether its default value is hanning or not.

Also, you are using n_mfcc=13, for torchaudio.transforms.MFCC. Is there any reason to not use the same value for kaldifeat and torchaudio.compliance.kaldi.mfcc?

@csukuangfj, hanning is the default of torchaudio.transforms.MFCC and num_ceps=13 is the default of kaldifeat.

I just updated the python code as above, adding those arguments. The result is unchanged.

csukuangfj commented 8 months ago

~Please show your complete code after your changes~

csukuangfj commented 8 months ago

Also, have you read and checked the following two points?

If you want to produce the same features for the same input, please ensure

you indeed use the same input

you indeed use the same arguments

csukuangfj commented 8 months ago

I strongly suggest that you have a look at https://pytorch.org/audio/main/_modules/torchaudio/transforms/_transforms.html#MFCC

You need to find all the arguments of MFCC and compare them with kaldifeat and torchaudio.compliance.kaldi.mfcc.

You need to spend time figuring out the reason by yourself.

For instance, you use the default value log_mels=False for MFCC, which is not correct if you want to get the same features as kaldifeat and torchaudio.compliance.kaldi.mfcc.

csukuangfj / kaldifeat

Difference between kaldifeat mfcc feature and torchaudio mfcc feature #87