Subtitle-Synchronizer / jlibrosa

Librosa equivalent Java library to process audio file adn extract features from it.
MIT License
89 stars 26 forks source link

The shape of mfcc #1

Closed FriedaSmith closed 3 years ago

FriedaSmith commented 3 years ago

Hi. The shape of mfcc is (20, 243)

wav, _ = librosa.load(wav_path, mono=True, sr=16000)
mfcc=librosa.feature.mfcc(wav, 16000)
mfcc.shape

However, the shapes using your repo is (20 , 729 ) and the results are quite different. How can I get the same result?

    JLibrosa jLibrosa = new JLibrosa();
    float audioFeatureValues[] = jLibrosa.loadAndRead(audioFilePath, 16000, -1);

    float[][] mfccValues = jLibrosa.generateMFCCFeatures(audioFeatureValues, 16000, 20);

    System.out.println(".......");
    System.out.println("Size of MFCC Feature Values: (" + mfccValues.length + " , " + mfccValues[0].length + " )");
    float[][] remfccValues = new float[mfccValues[0].length][mfccValues.length];
FriedaSmith commented 3 years ago

@VVasanth @abhi-rawat1 Hello. I'm trying to find out what caused the problem. I found that python and java have similar results when the sampling rate is the default. When they are not the default, their shapes are different.

Hi. The shape of mfcc is (20, 243)

wav, _ = librosa.load(wav_path, mono=True, sr=16000)
mfcc=librosa.feature.mfcc(wav, 16000)
mfcc.shape

However, the shapes using your repo is (20 , 729 ) and the results are quite different. How can I get the same result?

  JLibrosa jLibrosa = new JLibrosa();
  float audioFeatureValues[] = jLibrosa.loadAndRead(audioFilePath, 16000, -1);

  float[][] mfccValues = jLibrosa.generateMFCCFeatures(audioFeatureValues, 16000, 20);

  System.out.println(".......");
  System.out.println("Size of MFCC Feature Values: (" + mfccValues.length + " , " + mfccValues[0].length + " )");
  float[][] remfccValues = new float[mfccValues[0].length][mfccValues.length];
VVasanth commented 3 years ago

Hi Frieda,

If I understand your problem correctly, you are getting identical results between java and python when you are using 'default' sample rate and the values differ when you use custom sample rate.

Am I right? Is there anyway you could share the file with us to perform the analysis?

Thanks!

FriedaSmith commented 3 years ago

Processed values of audio files generated from jLibrosa would be very similar to the respective values from Python librosa files and the mfcc's shape of ./audioFiles/001_children_playing.wav is (40, 345), when the sampling rate is the default. python:

wav_path='I:\\Code\\jlibrosa\\audioFiles\\001_children_playing.wav'
x,sr = librosa.load(wav_path, sr=None)
mfccs = librosa.feature.mfcc(x, sr, n_mfcc=40)
np.savetxt('E:\\Corpus\\mfcc\\001_children_playing.txt',mfccs,fmt='%0.8f')
mfccs.shape

java:

    JLibrosa jLibrosa = new JLibrosa();
    float audioFeatureValues[] = jLibrosa.loadAndRead(audioFilePath, -1, -1);
    float[][] mfccValues = jLibrosa.generateMFCCFeatures(audioFeatureValues, -1, 20);
    System.out.println("Size of MFCC Feature Values: (" + mfccValues.length + " , " + mfccValues[0].length + " )");

mfcc of 001_children_playing.wav in librosa, when the sampling rate is the default. mfcc of 001_children_playing.wav in jlibrosa, when the sampling rate is the default. However, When they are not the default, the mfcc's shape using Python librosa is (20, 126) and it's (20 , 345 ) using jLibrosa and the datas were greatly different. python

wav, _ = librosa.load(wav_path, mono=True, sr=16000)
mfcc=librosa.feature.mfcc(wav, 16000)
wav, _ = librosa.load(wav_path, sr=16000)
mfcc=librosa.feature.mfcc(wav, 16000)
np.savetxt('E:\\Corpus\\mfcc\\001_children_playing_16000.txt',mfcc,fmt='%0.8f')

java:

    float audioFeatureValues[] = jLibrosa.loadAndRead(audioFilePath, 16000, -1);
    float[][] mfccValues = jLibrosa.generateMFCCFeatures(audioFeatureValues, 16000, 20);
    System.out.println("Size of MFCC Feature Values: (" + mfccValues.length + " , " + mfccValues[0].length + " )");

mfcc of 001_children_playing.wav in librosa, when the sampling rate is 16000. mfcc of 001_children_playing.wav in jlibrosa, when the sampling rate is 16000.

Githeo commented 3 years ago

If you specify the nMFCC and n_mels in the generateMFCCFeatures function I guess you'll find the right size as I do, but still I don't get exactly the same mfccValues as librosa though.

FriedaSmith commented 3 years ago

What value should n_mels be set to for the right size?

Githeo commented 3 years ago

To the value you seek, 20. Actually I had to set both nMFCC and n_mels to have the right size. Still the mfcc values I get are different from librosa python (have a look to the other issue). Hope it helps.

VVasanth commented 3 years ago

@FriedaSmith - Looks like, there exists an issue when we read the magnitude value from file with custom sampling rate. I will work on this and share the updated build soon. Thanks for reporting...

Apart from this issue - other features should work properly when we use 'default' sampling rate...pls confirm otherwise...