More information on multipitch evaluation?

craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.

MIT License

603 stars 112 forks source link

More information on multipitch evaluation? #335

Open maxpv opened 3 years ago

maxpv commented 3 years ago

I'm trying to understand the multipitch, the documentation redirects to two papers but I couldn't find anything that explains the metrics:

OrderedDict([('Precision', 0.5),
             ('Recall', 0.5),
             ('Accuracy', 0.3333333333333333),
             ('Substitution Error', 0.5),
             ('Miss Error', 0.0),
             ('False Alarm Error', 0.0),
             ('Total Error', 0.5),
             ('Chroma Precision', 0.5),
             ('Chroma Recall', 0.5),
             ('Chroma Accuracy', 0.3333333333333333),
             ('Chroma Substitution Error', 0.5),
             ('Chroma Miss Error', 0.0),
             ('Chroma False Alarm Error', 0.0),
             ('Chroma Total Error', 0.5)])

In the first paper from 2007 there are two sections in the Transcription Results section: Frame-level transcription (5.1) and Note onset detection (5.2). Due to the format of the input for multipitch.evaluate (frequencies associated with an onset) I suppose the 5.2 was mentioned. There's literally nothing in it that explains the metrics.

What am I missing? It seems unnecessary obscure to me.

justinsalamon commented 3 years ago

@rabitt

rabitt commented 3 years ago

Hey @maxpv

In the first paper from 2007 there are two sections in the Transcription Results section: Frame-level transcription (5.1) and Note onset detection (5.2).

In the documentation

The paper you mentioned, and a second are cited. The equations for all the metrics are there - Equations 3-6 in the first paper, and equations 1-8 in the second. Both papers give pretty lengthly explanations of the metrics if you want more details.

Due to the format of the input for multipitch.evaluate (frequencies associated with an onset) I suppose the 5.2 was mentioned.

There's no notion of onsets in mir_eval.multipitch.evaluate. Note-level metrics are implemented separately in mir_eval.transcription.

Hope that helps clarify things.

maxpv commented 3 years ago

Thanks for your detailed answer.

I think this should be added to the documentation, it is not obvious that the 5.1 Frame-Level Transcription is related to the metrics we try to compute with multipitch or transcription. Partly because of the input format -frequencies and timestamps, as opposed to an NxT matrix.

Example from the 5.1 section: TP (“true positives”) is the number of correctly transcribed voiced frames (over all notes). Ok, but what is a frame when the input is a list of intervals and pitches? To do that we need to set the offset_min_tolerance but this parameter isn't exposed in the doc for the mir_eval.transcription.evaluate.