Open dbogdanov opened 8 years ago
Ideally, we should re-use mir_eval, the Python library for computing common heuristic accuracy scores for various music/audio information retrieval/signal processing tasks.
http://craffel.github.io/mir_eval/#quickstart-using-mir-eval-in-python-code
Python unit tests provide basic testing for descriptors' sanity, however, they are not always sufficient to understand their intrinsic quality and suitability for practical applications. We are considering adding python scripts for evaluating music descriptors using ground-truth music collections.
Ideally these tests should cover all common mid-level descriptors provided by Essentia as long as we have proper ground truths for evaluation. Scripts should be generic enough to be able to evaluate other music analysis tools apart from Essentia. A user should be able to specify location for a ground truth collection as a command line parameter of the script. Scripts should generate reports in easily readable and understandable format. For ground truths, preference is given to ground-truth collections with audio in public domain, although we'll consider adding scripts for our in-house collections too.
List of music descriptors for QA evaluation (to be updated). I've put some people who could provide some feedback for these tests):