New Music Extractor - Githubissues

This is a summary of TODOs for the new Music Extractor and a list of known inconsistencies with previous version:

Add AcoustID fingerprints (#517)
ReplayGain
- Make sure our ReplayGain value is 100% correct (compare with a reference implementation) (@alastair)
- Should we change EqLoudness normalization from using ReplayGain to EBU128?
MFCC
- MFCC logType default has changed from dbpow to dbamp. We can keep dbpow for consistency, however, dbamp is formally more correct. Similar updates for GFCC for consistency.
- Decide whether to use Windowing with normalized=False which can lead to improvement for some classification tasks as reported in #525
- silence threshold is changed from 1e-9 to 1e-10. We need to evaluate both settings further (e.g. in DCASE classification and in simple genre classification).
- Only mean, cov and icov are currently computed. Compute all other stats as well?
Chroma
- Would it be useful to provide 12-bin HPCP vectors in addition to (or instead of) 36-bin one?
- Fine-tune HPCP parameters. Which normalization to use? (#348)
- Add Constant-Q spectrogram? (#530)
Onsets
- Use SuperFlux for onset detection if it performs better in evaluation
Key
- Implement Key algorithm improvements (#492)
Chords
- Decide key profile to use for chords detection (#522). Should we use three different profiles (similar to key)?
- Use ChordsDetectionBeats instead of ChordsDetection. This makes sense as we also output beats position to which chords values can then be aligned.
Spectral contrast
- Fixed spectral bands computation, now the band frequencies are more close to the ones stated in the paper
Should we include Panning?
Does computing spectral_rms makes sense after applying ReplayGain?

Speed optimization

Currently audio is being loaded five times.

AudioLoader for computing metadata (audio output is not used)
AudioLoader for computing EBU R128 loudness on the stereo segment from startTime to endTime
EqloudLoader (mono audio output) for computing ReplayGain on the startTime to endTime segment
EasyLoader (mono audio output) to load segment and apply the Replay Gain value computed in the previous step
- low-level features
- rhythm features
- tonal tuning frequency
Another EasyLoader (mono audio output) with the same parameters
- tonal features (requires tuning frequency computed on previous step)
- beats loudness (requires beats computed on previous step)

First three steps can be merged together in one computation network. Still, there should be a trade-off between computation speed and code complexity.

MTG / essentia

New Music Extractor #533

Speed optimization