Changes in FreesoundExtractor

dbogdanov commented 7 years ago

This issue is to keep track and discuss changes following the FreesoundExtractor updates (currently in music_extractor_refactor branch). @ffont

Updates in descriptors:

added reading metadata from files (may be useful)
added a complete bpm_histogram, better naming for related descriptors
added Central Moments Statistics, Flatness and Crest for energy bands
output melbands (only mfcc were output before)
added MelBands 128, removed spectral FrequencyBand.
added dynamic_complexity (a descriptor similar to dynamic range)
output both stdev and var statistics (for backward compatibility, both may be useful)
estimate key using three different key profiles, but keep chords estimation with a default temperlay profile
MFCC:
- MFCC logType default has changed from dbpow to dbamp; similar update in GFCC
- silence threshold will be changed from 1e-9 to 1e-10 (not merged to master yet)

Discuss:

(DONE) new extractor version: freesound 2.0 (previous version: 0.4) -- change to 0.5?
average_loudness is computed on short 2048-samples frames instead of 88000-samples frames as in MusicExtractor. No rationale is mentioned for either of the frame sizes.
If there were not beats detected, assume there is a beat is centered at 0 and compute its beat_loudness. We can output zero beat_loudness instead.
We could potentially include replay_gain (skipped for now). This algorithm is only defined for input signals which size is larger than 0.05ms and is more relevant for music that sounds (especially short ones).
(DONE) bpm_confidence is always 0 for degara beat tracking method (the algorithm used does not estimate confidence)
(DONE) compute mel-128 vs mel-96 bands? This would affect descriptor files sizes.
everything related to chords should be reviewed and improved (let's keep that for later)
use Windowing with normalized=False? This can lead to improvement for some classification tasks as reported in #525

ffont commented 7 years ago

new extractor version: freesound 2.0 (previous version: 0.4) -- change to 0.5?

Not sure what you propose, update to 2.0 or 0.5? I'd do 0.5.

average_loudness is computed on short 2048-samples frames instead of 88000-samples frames as in MusicExtractor. No rationale is mentioned for either of the frame sizes.

Maybe because there are more short sounds?

If there were not beats detected, assume there is a beat is centered at 0 and compute its beat_loudness. We can output zero beat_loudness instead.

These were changes that Gerard introduced to avoid failing analysis in some edge cases. If you think this makes more sense go ahead.

We could potentially include replay_gain (skipped for now). This algorithm is only defined for input signals which size is larger than 0.05ms and is more relevant for music that sounds (especially short ones).

Let's skip it if it's not too relevant for us.

bpm_confidence is always 0 for degara beat tracking method (the algorithm used does not estimate confidence)

Why don't we change this to use the Percival BPM algorithm? Then we can also include the loop BPM confidence descriptor.

compute mel-128 vs mel-96 bands? This would affect descriptor files sizes.

No strong opinion, whatever feels better for you.

everything related to chords should be reviewed and improved (let's keep that for later)

OK

use Windowing with normalized=False? This can lead to improvement for some classification tasks as reported in #525

You propose to change it to normalized=True then? I don't have a strong opinion. We can unify it with the music descriptor if it makes sense to you.

dbogdanov commented 7 years ago

Using Windowing with normalized=False will drastically affect descriptor values, we need to make sure first if the results we'll get are desirable.

MTG / essentia

Changes in FreesoundExtractor #582