craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
MIT License
588 stars 109 forks source link

Loosen separation test tolerance #302

Closed craffel closed 5 years ago

craffel commented 5 years ago

For some tests, the difference between the scores produced by Travis' build and my own build is as large as 3.89588121e-04. This was causing Travis to fail. In practice, we shouldn't care much about differences larger than 10e-3 for separation tasks since it would be very unusual for anyone to pay attention to differences this small when comparing source separation algorithms.

bmcfee commented 5 years ago

LGTM; I think this is related to our previous headaches deriving from BLAS discrepancies across platforms, but I agree that anything past the 3rd (2nd?) decimal place is unreliable in source sep anyway.

craffel commented 5 years ago

Amazingly, this is still not loose enough for the Python 3.4 build: https://travis-ci.org/craffel/mir_eval/jobs/477467832 The max absolute deviation between my system and Travis' system is 4.87191262e-03. What do we think about this? @faroit

faroit commented 5 years ago
craffel commented 5 years ago

Do we still need to support 3.4?

Not necessarily, but we have just not updated Travis. That this affects 3.4 and not, say, 3.5 or 3.6 is mostly a coincidence I think.

I totally agree to loosen it further. Bsseval already caused way too much trouble over here ;-)

Ok, will do.

bmcfee commented 5 years ago

Not necessarily, but we have just not updated Travis. That this affects 3.4 and not, say, 3.5 or 3.6 is mostly a coincidence I think.

I dropped 3.4 builds on travis (in librosa) for exactly this reason. Since mir_eval doesn't use any fancy features of >=3.5 (except, maybe, implicitly ordered dictionaries?), I don't see much point in keeping the 3.4 test around.