Caldarie / flutter_tflite_audio

Audio classification Tflite package for flutter (iOS & Android). Can support Google Teachable Machine models
MIT License
63 stars 24 forks source link

Different results on android and ios with mfcc model. #46

Open sarthak290 opened 1 year ago

sarthak290 commented 1 year ago

1) I have created my own model that use mfcc. With your pacakge it works fine with android but it is giving strange results with ios(almost same results everytime). Upon further going through your package I have encountered that:

In android when we use mfcc: (inside TflteAudioPlugin.java) InputData2D=[-584.5461, 160.31107, 54.33207, 41.93382, -6.2690716, -22.130093, -23.728022, -18.326862, -19.871655, -12.884017, 4.3321896, 7.76119, -10.174459, 14.504196, -8.621558, -29.163612, 1.3063055, 14.564109, -7.867571, 0.08920245, 13.916428, 23.708931, 8.588604, -7.8490815, -3.8324726, -10.441817, -2.3888357, 9.464556, -6.7323833, -3.6811, 2.5033593, -8.471148, -5.328222, 5.5226245, 6.6240654, -7.9169397, -9.550313, 9.346459, 5.020493, -6.127846]

while in ios when we use mfcc: (inside SwiftTflteAudioPlugin.swift) InputData=[-0.0000464e-200,0.0037373e-330,.........................................,0.0037363e-100]

In ios the InputData values are extremely low.

Clearly theres a huge difference between the inputData of android and ios that's why it is giving such strange results in ios(almost same every time)

2) I am testing on a physical device both for android and ios

sarthak290 commented 1 year ago

Any help with this issue?

Caldarie commented 1 year ago

Hi @sarthak290,

Thanks for the detailed response. Appreciate it when someone provides hard data.

For the iOS, i too came across this problem, and am unsure whether its a problem with this package or native mfcc package on iOS. Furthermore, both native Android and iOS libraries seem to produce different results, no matter how much I tweak the parameters.

As mentioned on this thread, i feel its better to just incorporate mfcc in your tensorflow model pipeline. I believe decoded wave models do exactly that.

If you could try tweaking the library itself, i would greatly appreciate it if you could share the solution to this problem.

sarthak290 commented 1 year ago

Hi @Caldarie

I have gone through the issue. The main issue is with the RosaKit mfcc package.

The same is issue is mentioned in Rosakit repository here(Please take a look.). RosaKit developers are accepting that there implementation for librosa mfcc extraction does not work properply in ios.

A possible solution for the problem is this. Solution is to use Aubio for swift instead of using RosaKit . A detailed solution is mentioned here .

Problem is I don't know how to integrate Aubio package in swift. Could you please update your package and replace RosaKit with Aubio. It would be a great help.

sarthak290 commented 1 year ago

To add on to your point on implementing mfcc in our model itself. I had done some research on that and to conclude it is not possible. Because the model is trained with mfcc values, we have to give mfcc values as input to the model.

It is not possible to give raw values as input, the model will convert the values to mfcc on itself then predict the result and return. ML model can only predict the result. We have to convert the raw values to mfcc first in swift side then only we can provide these values to the model to predict.

If you can throw some more light on this or a different approach for that please share your views.

Caldarie commented 1 year ago

hi @sarthak290

Many thanks for checking that out. Really appreciate the in depth research. Once I have time (perhaps later this month), I will implement this alternative package to this plugin. Do keep an eye out once its released.

In regards to your question about implementing mfcc, take a look at the picture below. This was taken from the decoded wav model provided in this example. Basically it takes raw audio and sample rate as the input. It then uses these two inputs to convert the data to mfcc. For more information about decoded wav models, take a look here

Screenshot 2023-01-09 at 21 33 14
SanaSizmic commented 1 year ago

Hi @sarthak290,

It is possible to give raw values as input, the model will convert the values to mfcc on itself and then predict the result In TensorFlow by using kapre library. I have trained my model and given the raw audio data as input see the screenshot screenshot-netron app-2023 01 10-14_33_01

sarthak290 commented 1 year ago

Hi @SanaSizmic ,

Thanks for the help with my problem. I will definitely try this. Will update you if this works.

As far I can see you have supplied melSpectogram values to the model and then converted into mfcc using this logMeltoMfcc function? Right!!

Once again thanks for the help.

sarthak290 commented 1 year ago

@Caldarie It would be a great help if you integrate this aubio plugin as soon as possible as many fellow developers are having same problem. Untill then I will try with @SanaSizmic solution.

Caldarie commented 1 year ago

@sarthak290 yeah, I recommend using @SanaSizmic solution as you’re basically unifying Mfcc conversion to the model. Doing it outside the model will introduce external problems that can influence the performance, case on hand this dilemma we are having with the native iOS package.

I will update this package when I can, but given my tight schedule cannot guarantee when it can be done.