ar1st0crat / NWaves

.NET DSP library with a lot of audio processing functions
MIT License
453 stars 71 forks source link

FeatureExtractor ComputeFrom FastCopy bug #58

Closed ssdesai closed 2 years ago

ssdesai commented 2 years ago

When trying to use MfccExtractor to compute the mfcc from a DiscreteSignal I am getting an out of bounds exception. I've traced this exception to be in the ComputeFrom(float[] samples, int startSample, int endSample, IList<float[]> vectors) function within FeatureExtractor.cs.

In this for loop on line 171: for (int sample = startSample; sample <= lastSample; sample += hopSize, i++)

The copy is run on every iteration (line 175): samples.FastCopyTo(block, frameSize, sample);

My case throws an exception when trying to copy a samples vector of size 704556 to a block of size 2048, with a frameSize of 1103, and sample with a value of 175077.

Looking at the implementation of FastCopyTo in MemoryOperationExtensions.cs: Buffer.BlockCopy(source, sourceOffset * _32Bits, destination, destinationOffset * _32Bits, size * _32Bits); source = samples, sourceOffset = sample = 175077, destionation = block, destinationOffset = 0, and size = 1103.

This makes it so that the BlockCopy is copying from offset 175077 4 = 700308 to offset 700308 + (1103 4) = 700308 + 4412 = 704720, which is greater than 704556.

Is there something I'm doing wrong here? I pass in a DIscreteSignal made from a float[] into the ComputeFrom function, and it seems that this function should be able to handle an array that does not have dimensions that are an exact multiple of frame size.

ar1st0crat commented 2 years ago

What's the value of hopSize?

You mention the number 704556. Is it the signal length (total number of samples) or the total number of bytes? Show your code where you set up MfccOptions.

ssdesai commented 2 years ago
        AudioWave audioWave = audio.GetAudioAtSampleRate(this.sampleRate);
        DiscreteSignal signal = new DiscreteSignal(this.sampleRate, audioWave.Buffer.FloatBuffer);

        MfccOptions options = new MfccOptions
        {
            SamplingRate = 2 * 22050,
            FeatureCount = 13,
        };

        MfccExtractor mfccExtractor = new MfccExtractor(options);

        List<float[]> mfccVectors = mfccExtractor.ComputeFrom(signal);

The code is above. I believe the 704556 is the signal length (it is the number of elements in the samples array). hopSize is 441.

I noticed that if I create a WaveFile object from the filepath, I am able to successfully compute the mfcc of each of the signals present as shown below. One question I do have is what is the difference between the two signals? I am already using the NAudio processing library for other functions (I have not found support in NAudio for mfcc or fft operations so that's why I came to NWaves) and am unsure why NAudio gives a single float buffer, while NWaves gives 2 separate signals.

        using (var stream = new FileStream(audioWave.FilePath, FileMode.Open))
        {
            waveContainer = new WaveFile(stream);
        }

        MfccOptions options = new MfccOptions
        {
            SamplingRate = 2 * 22050,
            FeatureCount = 13,
        };

        MfccExtractor mfccExtractor = new MfccExtractor(options);

        List<float[]> mfccVectors = mfccExtractor.ComputeFrom(waveContainer.Signals[1]);
ar1st0crat commented 2 years ago

NWaves gives two objects of DiscreteSignal if it deals with the stereo signal. In this case, NAudio's FloatBuffer represents interleaving samples, i.e. left[0], right[0], left[1], right[1], etc.. Hence, you'll need to extract signals in the left and the right channels separately, and then take their sum or average. (Or simply work with only one channel, for example). Possible LINQ-solution:

var left  = FloatBuffer.Where((c, i) => i % 2 == 0);
var right = FloatBuffer.Where((c, i) => i % 2 != 0);

var signal = new DiscreteSignal(this.sampleRate, left.Zip(right, (l, r) => l + r));

// or average:
// var signal = new DiscreteSignal(this.sampleRate, left.Zip(right, (l, r) => (l + r) / 2));

// or simply left channel:
// var signal = new DiscreteSignal(this.sampleRate, left);

MfccOptions options = new MfccOptions
        {
            SamplingRate = this.sampleRate,
            FeatureCount = 13,
        };

        MfccExtractor mfccExtractor = new MfccExtractor(options);
        List<float[]> mfccVectors = mfccExtractor.ComputeFrom(signal);

Maybe this video will also be helpful.

Also, you're multiplying sampling rate 22050 by 2. Is there a reason for this? Is sampling rate 22050 Hz or 44100 Hz? You need to specify the exact value of the sampling rate (which is constant and does not depend on other properties of input data).