Closed rabitt closed 4 years ago
@craffel per MIREX convention, melex algs typically output values in Hz (even if they internally work in cents, they'll typically convert to Hz for the final output). The instructions for MIREX 2019 seem unchanged with respect to previous years (algs need to output frequency values in Hz with 0/neg values to indicate unvoiced frames), so I guess the convention is still going strong.
I think it'd be great to support taking in a voicing vector, but agree we should remain backwards compatible.
Ok, in that case why did we design our metrics to take in cents in the first place?
My memory might fail me, but IIRC, they did originally take in Hz (at least in the very initial implementation pass), but since they involve an internal conversion to cents to compute the metrics, at some point we decided to factor the conversion out and provide utility functions. Still, our high-level functions such as evaluate still expect input in Hz.
The metrics used to (and still do in the PR) take frequency values in Hz. Internally they are converted to cents, but the conversion isn't necessary to compute the metrics. In the PR I removed the conversion to simplify the code, and the metrics are mathematically equivalent.
The metrics used to (and still do in the PR) take frequency values in Hz.
Not sure what you mean - I think they currently take cents, e.g. https://github.com/craffel/mir_eval/blob/master/mir_eval/melody.py#L429 https://github.com/craffel/mir_eval/blob/master/mir_eval/melody.py#L494 https://github.com/craffel/mir_eval/blob/master/mir_eval/melody.py#L576 The evaluate function takes in Hz and converts to cents, but user code might call the metric functions directly instead of evaluate.
@craffel mmm yep you're right - I was thinking just about melody.evaluate
. You and @lostanlen are right that while melody.evaluate
is backwards compatible (if I change back cent_tolerance
), the other functions are far from it.
Side note. The way the individual metrics are written in the current release makes them hard to use individually - if you want to call them using "standard" melody outputs (frequencies in Hz, negative frequencies indicating voicing, different time scales) you have to first manually call melody.to_cent_voicing
and before the individual metrics would work properly.
How should I proceed? I can rewrite the PR to be fully backwards compatible, but it will complicate the code a bunch. Or, can we do an in between PR with a deprecation warning on the individual metrics before this one is released? Or something else?
The way the individual metrics are written in the current release makes them hard to use individually - if you want to call them using "standard" melody outputs (frequencies in Hz, negative frequencies indicating voicing, different time scales) you have to first manually call melody.to_cent_voicing and before the individual metrics would work properly.
I agree that it's cumbersome, but it's not unheard of in mir_eval
, e.g. all the beat
metrics require that you call trim_beats
first e.g.
https://github.com/craffel/mir_eval/blob/master/mir_eval/beat.py#L193
How should I proceed? I can rewrite the PR to be fully backwards compatible, but it will complicate the code a bunch. Or, can we do an in between PR with a deprecation warning on the individual metrics before this one is released? Or something else?
We haven't handled deprecation before in mir_eval
. It's especially hard since they are being used as positional arguments. @bmcfee @lostanlen do you all have any experience/tips on how to do this from librosa
?
We haven't handled deprecation before in mir_eval. It's especially hard since they are being used as positional arguments. @bmcfee @lostanlen do you all have any experience/tips on how to do this from librosa?
I don't remember librosa having changed the unit of one ot its argument. As far as i can tell, all we've done is remove kwargs and remove function names. Here, the only option i see is (sigh):
1) give a new name to the proposed validate
, e.g. validate_cents
.
2) copy-paste validate
into a new name, e.g. validate_hertz
.
3) raise a Warning
every time the old validate
is being deprecated, and say that validate
will be in cents in the future. Users should rename their callsite to validate_hertz(
if they want to retain the old behavior, and validate_cents(
if they want to adopt the new behavior.
4) make a new release
5) remove the validate
function
6) make a new release.
7) wait a bit.
8) People complain about NameError: name 'validate' is not defined
. We point them to the right direction.
9) make a new release
10) add a DeprecationWarning
to validate_hertz
11) copy-paste validate_cents
into a new name: validate
12) add a DeprecationWarning
to validate_cents
(renamed to validate
)
13) make a new release
14) wait a long time.
15) Remove validate_hertz
and validate_cents
16) make a new release
17) 🤞
Yes, that sounds about right, except we'd need to do that for all of the metric functions (not just validate).
@craffel @lostanlen Thanks for all your comments. I finally found some time to get back to this - I've redone this PR and it should now be fully backwards compatible, and ready for a re-review.
Note this is still missing tests for non-binary voicing, but wanted to first check that the change is otherwise OK.
Any melody people want to chime in? @juanjobosch @justinsalamon
"melody people" 😍
supporting information attached: https://www.youtube.com/watch?v=6Sl-ad304Aw
a wild funky appears
Hey @craffel, sure! I can have a look in a couple of days.
Just added a bunch of tests (which caught a baddddd bug in the new code), so pending review, this is done on my side!
I'm tracking this, but totally swamped right now so @juanjobosch will probably get to this sooner than I do, but if not I'll give it a pass as soon as I can.
A first remark, which I think we commented briefly offline some time ago, but may be worth mentioning it here too.
In the generalised metrics article, voicing-related accuracies, only consider non-binary voicing for the estimates (not for the annotations, and that's why in this implementation it uses ref_indicator, as a binary indicator of ref_voicing). And the formulas of pitch-related accuracies (RPA, RCA, OA) consider a "reward" for correct pitch and chroma estimates, which could be made equivalent to ref_voicing, as it is done in this implementation.
I think it may be ok to maintain the current terminology, as it probably makes it easier to keep the conceptual relation to the classic metrics (and therefore keep using e.g. "ref_voicing" in the RPA denominator, instead of e.g. "pitch_estimation_reward"), but I think it would be worth to at least add a comment in the functions where the terms in the paper's formula is different than the implementation (RPA, RCA, OA) saying something like the pitch estimation reward from [ref] is here made equivalent to ref_voicing.
@craffel I think this should be good to go now. Just noticing - what happened to this repo's tests? I'm seeing tests run on my fork, but not on this PR.
It looks like it's running on your PR but Travis isn't auto-commenting or anything. It also looks broken by other stuff :( https://travis-ci.org/github/craffel/mir_eval
display.piano_roll
changing======================================================================
FAIL: test_display.test_pianoroll.test
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/miniconda/envs/test-environment/lib/python3.5/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/travis/build/craffel/mir_eval/tests/mpl_ic.py", line 56, in failer
result = f(*args, **kwargs)
File "/home/travis/build/craffel/mir_eval/tests/mpl_ic.py", line 219, in do_test
' (RMS %(rms).3f)' % err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /home/travis/build/craffel/mir_eval/tests/result_images/test_display/piano_roll.png vs. /home/travis/build/craffel/mir_eval/tests/result_images/test_display/piano_roll-expected.png (RMS 26.578)
Mind checking if this PR would cause that?
@craffel I rebased on master and found the culprit of the error you mentioned. I updated the internals of some of the mir_eval.display
functions to fix this. Everything is happily passing now :)
Looks good, thanks so much for your hard work on this @rabitt !
Implements #317 , cc @juanjobosch
The metrics behave identically as before, but now users can optionally specify a voicing array, which overwrites any voicing coming from negative frequency values.
Note that I (mostly) removed all conversion to cents, since the conversion cancels out in the metrics. The only place where the cents values remain is in
resample_melody_series
. Right now when the frequency values are resampled, they're resampled in cents, not in Hz. If we were to instead resample in Hz, and all of the metrics will go up (slightly) - but I left it as is for now.This PR is missing: