dhrebeniuk / RosaKit

LibRosa port to Swift for ability using same prepossessing logic in iOS/MacOS platforms
MIT License
83 stars 14 forks source link

MFCC Function Testing #9

Open rahul140490 opened 2 years ago

rahul140490 commented 2 years ago

Hi, I want to extract 13 MFCC values from an audio file and I am using the newly added mfcc function like this - mfcc(nMFCC: 13, nFFT: 2048, hopLength: 512, sampleRate: 22050, melsCount: 128)

But the result of this function is a huge multi-array containing numerous double values for each chunk. As per my understanding, the result should be a linear array of 13 values for each chunk. Please correct me if I am wrong and please suggest how to get it working properly.

Also, I used this function to test in SpectrogramViewController :-

private func loadData() {
    spectrograms = [[Double]]()
    let url = Bundle.main.url(forResource: "test", withExtension: "wav")
    let soundFile = url.flatMap { try? WavFileManager().readWavFile(at: $0) }

    let dataCount = soundFile?.data.count ?? 0
    let sampleRate = soundFile?.sampleRate ?? 44100
    let bytesPerSample = soundFile?.bytesPerSample ?? 0

    let chunkSize = 66000
    let chunksCount = dataCount/(chunkSize*bytesPerSample) - 1

    let rawData = soundFile?.data.int16Array

    for index in 0..<chunksCount-1 {
        let samples = Array(rawData?[chunkSize*index..<chunkSize*(index+1)] ?? []).map { Double($0)/32768.0 }            
        let powerSpectrogram = samples.melspectrogram(nFFT: 1024, hopLength: 512, sampleRate: Int(sampleRate), melsCount: 128).map { $0.normalizeAudioPower() }
        spectrograms.append(contentsOf: powerSpectrogram.transposed)
        let mfccData = samples.mfcc(nMFCC: 13)
        print("mfcc - \(mfccData)")
    }
dhrebeniuk commented 2 years ago

@rahul140490 , even in librosa: https://librosa.org/doc/main/generated/librosa.feature.mfcc.html

result is 2d matrix

rahul140490 commented 2 years ago

Oh yeah that seems right. But for testing, would it be possible to compare the results of librosa and rosakit for the same audio file and configurational values?

dhrebeniuk commented 2 years ago

@rahul140490 , it's good question, I done just simple tests. As for my experience there might be problems. Because I quickly ported to iOS dct function from scipy, it's implemented in C++ and used bridges to python.

I tried resolve and remove this dependencies. But there might be some problems. Because C++ types casting working different. (It's addition pain)

rahul140490 commented 2 years ago

Got it. Thanks for the explanation. Could you please help me with a problem I am facing with this, I am trying to get single 13 MFCC values for an audio file. Meaning, the complete .wav file should be processed in one go, not chunk wise or frame wise. In simpler terms, MFCC for an audio file, not per chunk or per frame like we do in above loadData() function.

popigg commented 2 years ago

I just compared the MFCCs values extracted from the same audio file using Rosakit and libRosa and are not the same values. What makes me wonder is the difference in orders of magnitude:

Librosa [[-5.56532669e+01 -8.21184998e+01 -5.00438271e+01 -3.83153648e+01 -2.21641731e+01 -3.63747215e+01 -4.03212852e+01 -6.56709290e+01 -9.50198364e+01 -1.11017715e+02 -1.25539406e+02 -1.29669861e+02 -1.66457108e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02] .......]

iOS [[1.839091783e-315, 1.83909182e-315, 1.839091858e-315, 1.8390919e-315, 1.839091937e-315, 1.83909198e-315, 1.839092016e-315, 1.839092055e-315, 1.8390921e-315, 1.83909214e-315, 1.83909218e-315, 1.83909222e-315, 1.839092223e-315, 1.83909228e-315, 1.83909235e-315, 1.83909239e-315, 1.83909241e-315, 1.839092465e-315, 1.8390925e-315, 1.38686236e-315, 9.54278096e-316, 6.963632153146e-312, 1.22516989e-315, 1.22517421e-315, 1.83911609e-315, 1.83911613e-315, 1.83911617e-315, 1.83911621e-315, 1.839116254e-315, 1.839116284e-315, 1.83911633e-315, 1.83911637e-315, 1.83911641e-315, 1.839116447e-315, 1.839116487e-315, 1.83911653e-315, 1.83911657e-315, 1.839116576e-315, 1.839116635e-315, 1.8391167e-315, 1.839116744e-315, 1.83911676e-315, 1.83911682e-315, 1.839116847e-315, 6.96440509699e-312, 1.22517011e-315, 1.2251742e-315] ..........]

When librosa got e+02 Rosakit is e-315 ... Here are the methods used for extracting the coefficients.

image

image

Thanks

rahulkumaratphilips commented 2 years ago

Hi @popigg,

Were you able to figure out how to get the similar results as of LibRosa from Rosakit? Or did you follow some other implementation to do so? Please share as I am stuck on this for a year now.

Hi @dhrebeniuk, I've also tried to port multiple implementations of DCT-II orthogonal but none of them provides the similar results as of Python. Please suggest some way out here, I see every author has their own version of DCTs done.

popigg commented 2 years ago

Hi, @rahulkumaratphilips. I found a solution to extract MFCCs in iOS but away from RosaKit. I used https://aubio.org/ and it worked 🚀 . I needed to train the model again with this new feature extractor. It is tough because requires to accumulate manually the MFCCs for the selected window, and in iOS the implementation is based on C++ library which looks a bit different. If you are interested in following that path I can help with a gist with some code. Good luck,

rahulkumaratphilips commented 2 years ago

Hi @popigg, Yes I'll be very much interested in trying aubio if you can please help me to get started on this. Also, I would like to know how much Aubio's MFCCs are different from Librosa's?

dhrebeniuk commented 2 years ago

@dhrebeniuk , I'm sorry but today on block post russian solders ask put my macbook and iPhone on the ground and leave it. When I get devices again I will have ability take this task in work.

rahulkumaratphilips commented 2 years ago

Hi @dhrebeniuk/ @popigg, I've found an implementation for DCT in python that outputs same results as that of SciPy's DCT. But, I need your help to port it into Swift as I am not that much fluent in Python. I'll really appreciate if you guys can help me with it as it's the only missing piece in my MFCC problem. The DCT implementation is -

` def dct2(x,n=None): fft = np.fft.fft x = np.atleast_1d(x) print("atLeast -", x)

if n is None:
    n = x.shape[-1]
print("n when none", n)

if x.shape[-1]<n:
    n_shape = x.shape[:-1] + (n-x.shape[-1],)
    xx = np.hstack((x,np.zeros(n_shape)))
    print("if xx -", n)
else:
    xx = x[...,:n]
    print("else xx -", n)

real_x = np.all(np.isreal(xx))
print("real_x -", real_x)

if (real_x and (np.remainder(n,2) == 0)):
    evenHStack = np.hstack( (xx[...,::2], xx[...,::-2]))
    xp = 2 * fft(np.hstack( (xx[...,::2], xx[...,::-2]) ))
    print("even hstack -", evenHStack)
    print("even xp -", xp)
else:
    oddHStack = np.hstack((xx, xx[...,::-1]))
    xp = fft(np.hstack((xx, xx[...,::-1])))
    xp = xp[...,:n]
    print("odd hstack -", oddHStack)
    print("odd xp -", xp)

w = np.exp(-1j * np.arange(n) * np.pi/(2*n))
print("w -", w)

y = xp*w
print("y -", y)

print("real_x -", real_x)
if real_x:
    print("y real -", y.real)
    return y.real
else:
    print("only y -", y)
    return y

`

popigg commented 2 years ago

Hey @rahulkumaratphilips.

I have created these 2 gists. This is how it works for me using aubio.

swift MFCC extractor https://gist.github.com/popigg/3847a4cf71a1898e795f3fa5b8aff9a2

python MFCC extractor https://gist.github.com/popigg/de8d8db8ceb7db5adb23d58477a92e74

The aubio instalation guide can be found here https://aubio.org/manual/latest/installing.html

rahulkumaratphilips commented 2 years ago

Hey @popigg,

Thanks a lot for your support. I'll try these out and let you know.

dhrebeniuk commented 2 years ago

@popigg , @rahulkumaratphilips , Hello guys if you can send pull request with changes, please send I will approve them.

rahulkumaratphilips commented 2 years ago

Hi @dhrebeniuk, I have a request. I know you'd be busy with other features, but if you get time, please look into why our DCT function's output aren't matching to Python's DCT function. Because of this DCT function only our MFCC values aren't matching to that of Librosa.

zac commented 2 years ago

Hi @dhrebeniuk, I have a request. I know you'd be busy with other features, but if you get time, please look into why our DCT function's output aren't matching to Python's DCT function. Because of this DCT function only our MFCC values aren't matching to that of Librosa.

@rahulkumaratphilips I don't think @dhrebeniuk isn't able to work on this because they're busy with features. It's because of Russia's invasion of Ukraine.

I too am encountering some issues with MFCCs not lining up with librosa, but... @dhrebeniuk please take care of yourself and your family and make sure you're safe before you feel like you might want to contribute changes or get back to us. This thread can wait.

Thank you for RosaKit! It's been a great little library and has helped do some cool things that aren't quite covered by Apple's built-in DSP.

rahulkumaratphilips commented 2 years ago

Oh my bad, I didn't know @dhrebeniuk you're from Ukraine. Hoping you can find the strength to keep going, by knowing how many people around the world support you. One day soon things will be better. Our continuous thoughts of support are with you! Please stay strong and take care of yourself and loved ones the best you can.