craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
MIT License
606 stars 113 forks source link

key detection module returns score=0 when tested key is a fifth below reference key #204

Closed steiml closed 8 years ago

steiml commented 8 years ago

Hi,

I just used the library for evaluating key detection and stumbled upon this behavior which looks odd to me:

Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://anaconda.org

import mir_eval mir_eval.version '0.3' mir_eval.key.evaluate('C major', 'G major') OrderedDict([('Weighted Score', 0.5)]) mir_eval.key.evaluate('G major', 'C major') OrderedDict([('Weighted Score', 0.0)])

On the MIREX site it says that keys which "differ" by a fifth should be given a score of 0.5 - at least that's how I understand it:

Keys will be considered as "close" if they have one of the following relationships: distance of perfect fifth [...] http://www.music-ir.org/mirex/wiki/2016:Audio_Key_Detection I read this as follows: If the reference key is C major, than detected keys of G major or F major should be assigned a score 0.5.

Best, Michael Stein

stefan-balke commented 8 years ago

Hi Michael,

I think you are right and this is a bug imho.

Relevant line in the code is this one: https://github.com/craffel/mir_eval/blob/master/mir_eval/key.py#L129

Maybe citing Elaine Chew from some years ago clarifies things (http://www.music-ir.org/mirex/wiki/2005:Audio_and_Symbolic_Key):

Assumption of closeness: Perfect 5th: Is this generally accepted as an almost similar key? [Arpi 02.08.05]: Yes it is. Please refer to http://www-rcf.usc.edu/~echew/papers/CiM2003 for further details. [EC 02.08.05]: Keys a perfect fifth apart share all but one pitch (with the differing pitches being only one half step apart). The above paper describes three models for tonality (by Krumhansl, Lerdahl and Chew) with similar relative distances between keys which are consistent with that mentioned in our proposal.

So, C maj and F maj and G maj should have the same relationship and thus the same score.

Any opinions against this?

Otherwise it's an easy fix.

bmcfee commented 8 years ago

I seem to recall some argument for treating ascending and descending fifths differently, but maybe that was in chord eval. Searching through the issues turned up nothing -- maybe @craffel or @ejhumphrey know what I'm talking about?

stefan-balke commented 8 years ago

Looking at the history of the source-code there was not much of a discussion :)

I looked through the test cases which are considered and they only consist of an ascending fifth but lacks a descending one.

craffel commented 8 years ago

I believe this is known behavior and matches the behavior of the MIREX code; we discussed it in the pull request: https://github.com/craffel/mir_eval/pull/181 It is also a bit odd to me personally, but I don't know anything about music theory so I defer to experts. Unless there is community consensus that this is wrong (and the MIREX code is too), we will leave it as-is.

justinsalamon commented 8 years ago

I think it was me who was arguing that this made sense (or at least seemed sensible), here's the relevant bit:

I imagine that if the estimated key is a perfect fifth above the true key then you get 0.5 because they are highly related (tonally). The circle of fifths is not symmetric: the fifth of C is G, but the fifth of G is D. So if the true key is a C, estimating a G (a fifth) is closely related, but estimating a perfect fifth down (an F) is not as tonally related (F is the 4th for C, not the 5th) and hence not rewarded at all.

Estimating a G below the C (ie a perfect 4th down) should still get rewarded, but since this is key estimation I imagine this is octave agnostic right? Hence you only have to look for a fifth "up" (in reality its a chroma circle, so there is no up and down)

Basically, the relationships between scales are not symmetric: estimating a tonic of G when it's really C is not the same as estimating a tonic of C when it's really G.

bmcfee commented 8 years ago

On the MIREX site it says that keys which "differ" by a fifth should be given a score of 0.5 - at least that's how I understand it:

That's a little vague, but consistent with what @justinsalamon's describing.

FWIW, the mir_eval documentation for key estimation is more precise, but could perhaps still be improved.

stefan-balke commented 8 years ago

Basically, the relationships between scales are not symmetric: estimating a tonic of G when it's really C is not the same as estimating a tonic of C when it's really G.

Well, it depends on the definition. If we follow Elaine's argument, then any two scales which share all but one note (Cmaj-Fmaj share all but Bb, Cmaj-Gmaj share all but F# etc.). Also, the symmetry is re-established if you talk about going a fifth up or a fifth down instead of a fifth up or a fourth up. But just my 2 cents...

stefan-balke commented 8 years ago

Sorry, can't let go of this. Here is a citation from:

Chuan, C.-H. and Chew, E., "Fuzzy Analysis in Pitch Class Determination for Polyphonic Audio Key Finding," in Proceedings of the 6th International Society for Music Information Retrieval Conference, London, UK, September 2005. ISBN: 0-9551179-0-9

screen shot 2016-06-16 at 12 26 44

The authors mention two different things:

This would explain why only ascending fifths are considered. But the logic is not complete yet for me...

steiml commented 8 years ago

I wasn't aware of the discussion related to the pull request. Personally, I'm more in favor of the symmetric approach but I also understand why the "clock-wise" approach makes sense. And if the MIREX evaluation is done that way it should definitely stay in there to be consistent. I'm wondering if it makes sense to add an option to the evaluator, which also considers a fifth downwards a close key and therefore gets rewarded with score of 0.5, too. I agree with Stefan that the fix would indeed be simple. I'm wondering if any regular MIREX key detection participant has an opinion, he/she could share to shed some more light on this issue? A few years ago, I worked in the lab with Johan Pauwels, who did his PhD on key and chord detection and took part several times. But I'm afraid, I don't have recent contact details.Or maybe someone from IRCAM or QMUL - they seem to be regular participants of the last years ...

bmcfee commented 8 years ago

Just a reminder: this kind of thing would be perfect to discuss at the ISMIR unconference session, if any of y'all will be in attendance.

steiml commented 8 years ago

Great idea, I'll most certainly be there.

stefan-balke commented 8 years ago

Okay, I did ask Ching-Hua Chuan for this, but still the reasoning is only because it's the perfect fifth. fullstop

I miss the physical argumentation here but the mir_eval implementation is correct and that was to be clarified.