apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.4k stars 636 forks source link

MFCC differences between torchaudio and CoreML #2206

Closed twoertwein closed 5 months ago

twoertwein commented 6 months ago

🐞Describing the bug

I want to train a model that uses MFCCs as the input. Luckily CoreMLTools is able to convert torchaudio's MFCC :) but there are some numerical differences. I assume those differences arise because CoreML/torchaudio use different parameters (such as FFT size, the number of mel frequencies and so on). I think it is mainly influenced by higher frequencies (reducing torchaudio's n_mels decreases the difference a bit).

What are the recommended parameters to minimize the discprenacy between CoreML/torchaudio's MFCC?

To Reproduce

import torch
import torchaudio
import coremltools
import numpy as np

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.mfcc = torchaudio.transforms.MFCC()

    def forward(self, wav):
        return self.mfcc(wav)

x, fs = torchaudio.load("test.wav", normalize=True)

model = Model()
model.eval()
model = torch.jit.trace(model, x)

y = model(x).numpy()

core_model = coremltools.convert(
    model, convert_to="mlprogram", inputs=[coremltools.TensorType(shape=x.shape)]
)

core_model.save("newmodel.mlpackage")
core_y = core_model.predict({"wav": x.numpy()})

difference = np.abs(next(iter(core_y.values())) - y).mean()
print(difference)  # 0.04909986

System environment (please complete the following information):

twoertwein commented 5 months ago

This seems to be a normal quantization error (from 32->16), it goes away when passing compute_precision=coremltools.precision.FLOAT32