MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.84k stars 530 forks source link

Faster TriangularBands implementation #494

Closed dbogdanov closed 7 years ago

dbogdanov commented 7 years ago

Currently, the triangular weighting for spectrum frequency bins is computed on the fly inside compute(). We should consider precomputing filter weights just as it is done in MelBands algorithm. This would lead to faster computation but larger memory consumption in the case of very large number of bands and spectrum bins.

MelBands can be later re-implemented to re-use TriangularBands instead of duplicating the code later on.

georgid commented 7 years ago

@pabloEntropia
In the TriangularBands we should add a boolean parameter normalize that signifies if the default filter height of 1 is normalized or not.
Also in MelBands the formula for the warping function from spectrum to mell should be given as a parameter.

dbogdanov commented 7 years ago

We can have two normalization types for normalize: "unit_sum" and "unit_max".

palonso commented 7 years ago

I have been working on the reimplementation on TriangularBands as basis for all the methods relaying on triangular filterbanks. I've found another feature that we should consided. From MelBands the triangles are computed on the mel scale, not only the boundary frequencies of it (as would happen on the current implementation of TriangularBands).

for (int j=jbegin; j<jend; ++j) {
      Real binfreq = j*frequencyScale;
      // in the ascending part of the triangle...
      if ((binfreq >= _filterFrequencies[i]) && (binfreq < _filterFrequencies[i+1])) {
        _filterCoefficients[i][j] = (hz2mel(binfreq) - hz2mel(_filterFrequencies[i])) / fstep1;
      }
      // in the descending part of the triangle...
      else if ((binfreq >= _filterFrequencies[i+1]) && (binfreq < _filterFrequencies[i+2])) {
        _filterCoefficients[i][j] = (hz2mel(_filterFrequencies[i+2]) - hz2mel(binfreq)) / fstep2;
      }
}

In order to fix this I would need to create a new parameter defining the scale for the weight computation. So far, and also considering #501, the parameter list in order to make the function as flexible as possible would be:

[current parameters]

[new parameters]

Note that default values should keep the current behaviour. Are you OK with this implementation?

dbogdanov commented 7 years ago

Triangle weights can be computed both on Hz scale and in Mel scale, and this depends on the implementation. We are computing on the Mel scale, which is probably more perceptually relevant that using Hz scale, although I do not know how results would differ in practice.

This brings in a few ideas to discuss:

georgid commented 7 years ago

@pabloEntropia In general the computation of the filter boundaries should always be done in the warped space, correct! Then the weights for a triangle filter are computed at the positions of the frequency bins (in Hz scale). So weights should be computed in frequency domain (as the triangles are equilateral as the image from librosa verifies that) librosa_slaney_mel_filters . So I suggest that we keep the weight computation in TriangularBands as it is for both melScale and SpectrumToCent. So 'scale' (better name suggestion: warping formula) should not be a parameter of TriangularBands, but of MelBands. The rest of the suggested params are ok.

dbogdanov commented 7 years ago

After further discussion, it still makes sense to implement "scale" {'linear', 'cents', 'mel', 'htk'} as was suggested because we want to avoid any duplicate code in MelBands.

We can have a better name for this parameter: "weighting" ("type of weighting function for determining triangle area").