Closed FriedaSmith closed 3 years ago
@VVasanth @abhi-rawat1 Hello. I'm trying to find out what caused the problem. I found that python
and java
have similar results when the sampling rate is the default. When they are not the default, their shapes are different.
Hi. The shape of
mfcc
is (20, 243)wav, _ = librosa.load(wav_path, mono=True, sr=16000) mfcc=librosa.feature.mfcc(wav, 16000) mfcc.shape
However, the shapes using your repo is (20 , 729 ) and the results are quite different. How can I get the same result?
JLibrosa jLibrosa = new JLibrosa(); float audioFeatureValues[] = jLibrosa.loadAndRead(audioFilePath, 16000, -1); float[][] mfccValues = jLibrosa.generateMFCCFeatures(audioFeatureValues, 16000, 20); System.out.println("......."); System.out.println("Size of MFCC Feature Values: (" + mfccValues.length + " , " + mfccValues[0].length + " )"); float[][] remfccValues = new float[mfccValues[0].length][mfccValues.length];
Hi Frieda,
If I understand your problem correctly, you are getting identical results between java and python when you are using 'default' sample rate and the values differ when you use custom sample rate.
Am I right? Is there anyway you could share the file with us to perform the analysis?
Thanks!
Processed values of audio files generated from jLibrosa would be very similar to the respective values from Python librosa files and the mfcc's shape of ./audioFiles/001_children_playing.wav
is (40, 345)
, when the sampling rate is the default.
python:
wav_path='I:\\Code\\jlibrosa\\audioFiles\\001_children_playing.wav'
x,sr = librosa.load(wav_path, sr=None)
mfccs = librosa.feature.mfcc(x, sr, n_mfcc=40)
np.savetxt('E:\\Corpus\\mfcc\\001_children_playing.txt',mfccs,fmt='%0.8f')
mfccs.shape
java:
JLibrosa jLibrosa = new JLibrosa();
float audioFeatureValues[] = jLibrosa.loadAndRead(audioFilePath, -1, -1);
float[][] mfccValues = jLibrosa.generateMFCCFeatures(audioFeatureValues, -1, 20);
System.out.println("Size of MFCC Feature Values: (" + mfccValues.length + " , " + mfccValues[0].length + " )");
mfcc of 001_children_playing.wav in librosa, when the sampling rate is the default. mfcc of 001_children_playing.wav in jlibrosa, when the sampling rate is the default. However, When they are not the default, the mfcc's shape using Python librosa is (20, 126) and it's (20 , 345 ) using jLibrosa and the datas were greatly different. python
wav, _ = librosa.load(wav_path, mono=True, sr=16000)
mfcc=librosa.feature.mfcc(wav, 16000)
wav, _ = librosa.load(wav_path, sr=16000)
mfcc=librosa.feature.mfcc(wav, 16000)
np.savetxt('E:\\Corpus\\mfcc\\001_children_playing_16000.txt',mfcc,fmt='%0.8f')
java:
float audioFeatureValues[] = jLibrosa.loadAndRead(audioFilePath, 16000, -1);
float[][] mfccValues = jLibrosa.generateMFCCFeatures(audioFeatureValues, 16000, 20);
System.out.println("Size of MFCC Feature Values: (" + mfccValues.length + " , " + mfccValues[0].length + " )");
mfcc of 001_children_playing.wav in librosa, when the sampling rate is 16000. mfcc of 001_children_playing.wav in jlibrosa, when the sampling rate is 16000.
If you specify the nMFCC and n_mels in the generateMFCCFeatures function I guess you'll find the right size as I do, but still I don't get exactly the same mfccValues as librosa though.
What value should n_mels
be set to for the right size?
To the value you seek, 20. Actually I had to set both nMFCC and n_mels to have the right size. Still the mfcc values I get are different from librosa python (have a look to the other issue). Hope it helps.
@FriedaSmith - Looks like, there exists an issue when we read the magnitude value from file with custom sampling rate. I will work on this and share the updated build soon. Thanks for reporting...
Apart from this issue - other features should work properly when we use 'default' sampling rate...pls confirm otherwise...
Hi. The shape of
mfcc
is (20, 243)However, the shapes using your repo is (20 , 729 ) and the results are quite different. How can I get the same result?