Music descriptors QA scripts wishlist

Python unit tests provide basic testing for descriptors' sanity, however, they are not always sufficient to understand their intrinsic quality and suitability for practical applications. We are considering adding python scripts for evaluating music descriptors using ground-truth music collections.

Ideally these tests should cover all common mid-level descriptors provided by Essentia as long as we have proper ground truths for evaluation. Scripts should be generic enough to be able to evaluate other music analysis tools apart from Essentia. A user should be able to specify location for a ground truth collection as a command line parameter of the script. Scripts should generate reports in easily readable and understandable format. For ground truths, preference is given to ground-truth collections with audio in public domain, although we'll consider adding scripts for our in-house collections too.

List of music descriptors for QA evaluation (to be updated). I've put some people who could provide some feedback for these tests):

Key and scale (@angelfaraldo)
Chords (@angelfaraldo)
Beats, BPM (@ffont)
Onset rate (@MartinHN)
Danceability

MTG / essentia

Music descriptors QA scripts wishlist #406