MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.86k stars 534 forks source link

Changes in FreesoundExtractor #582

Open dbogdanov opened 7 years ago

dbogdanov commented 7 years ago

This issue is to keep track and discuss changes following the FreesoundExtractor updates (currently in music_extractor_refactor branch). @ffont

Updates in descriptors:

Discuss:

ffont commented 7 years ago
  • new extractor version: freesound 2.0 (previous version: 0.4) -- change to 0.5?

Not sure what you propose, update to 2.0 or 0.5? I'd do 0.5.

  • average_loudness is computed on short 2048-samples frames instead of 88000-samples frames as in MusicExtractor. No rationale is mentioned for either of the frame sizes.

Maybe because there are more short sounds?

  • If there were not beats detected, assume there is a beat is centered at 0 and compute its beat_loudness. We can output zero beat_loudness instead.

These were changes that Gerard introduced to avoid failing analysis in some edge cases. If you think this makes more sense go ahead.

  • We could potentially include replay_gain (skipped for now). This algorithm is only defined for input signals which size is larger than 0.05ms and is more relevant for music that sounds (especially short ones).

Let's skip it if it's not too relevant for us.

  • bpm_confidence is always 0 for degara beat tracking method (the algorithm used does not estimate confidence)

Why don't we change this to use the Percival BPM algorithm? Then we can also include the loop BPM confidence descriptor.

  • compute mel-128 vs mel-96 bands? This would affect descriptor files sizes.

No strong opinion, whatever feels better for you.

  • everything related to chords should be reviewed and improved (let's keep that for later)

OK

  • use Windowing with normalized=False? This can lead to improvement for some classification tasks as reported in #525

You propose to change it to normalized=True then? I don't have a strong opinion. We can unify it with the music descriptor if it makes sense to you.

dbogdanov commented 7 years ago

Using Windowing with normalized=False will drastically affect descriptor values, we need to make sure first if the results we'll get are desirable.