craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
MIT License
588 stars 109 forks source link

Generalized melody metrics #317

Closed rabitt closed 4 years ago

rabitt commented 5 years ago

Myself and @juanjobosch have a recently accepted ISMIR paper proposing an extension of the existing melody metrics (allows for continuous, rather than binary, voicing estimations). The new metrics are equivalent to the existing ones when binary voicings are given.

Can we add these new metrics to mir_eval? They remove the need for reporting "unvoiced" frames as negative frequency values, which would clean up the underlying code a lot. But, it would change the existing API - right now the input to mir_eval.melody is ref_times, ref_freqs, est_times, est_freqs. To use the new metrics, it would need ref_times, ref_freqs, ref_voicing, est_times, est_freqs, est_voicing.

Alternatively, we could keep mir_eval.melody as-is and instead create mir_eval.f0 with the new metrics. The metrics are already used for more tasks than melody (e.g. pitch tracking, bass tracking), so f0 could be a nice generalization.

@craffel , what do you think?

justinsalamon commented 5 years ago

Quick comment - @juanjobosch also proposed some melody metrics that factor into account pitch continuity back in 2016. Would it be worth considering adding those in as well?

In both cases this diverges from the "standard" metrics used in MIREX, but I think these metrics are not sufficient, so if we think this can encourage better eval in this area I'd be all for adding both the soft voicing metrics and pitch continuity metrics.

I might however, be worth supporting backwards compatibility (neg freqs for voicing) for the sake of keeping the API compatible with MIREX and older algs?

craffel commented 5 years ago

Some additional (ugly) options:

  1. Make the args ref_times, ref_freqs, est_times, est_freqs, ref_voicing=None, est_voicing=None so that these new inputs are optional.
  2. Make ref_freq optionally be 2D. When 1D, assume the format where a negative value indicates unvoiced. When 2D, Row 1 is (strictly positive) frequency estimate and row 2 is the voicing.

I generally prefer not to change the API of existing functions, but if you expect your new metrics to be a very common use-case then I could be convinced. Note that you are not actually changing the API for any of the metric functions, only for to_cent_voicing which is a preprocessing function.

If your new metrics are significantly different from the existing ones (so that minimal code could be shared), mir_eval.f0 is an OK idea. Otherwise, I am not inclined to add an additional module.

To @justinsalamon's point, I agree that either way we must maintain backward compatibility.

justinsalamon commented 5 years ago

Between options 1 and 2 perhaps 1 is better?

lostanlen commented 4 years ago

closed by #324 ?