craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
MIT License
588 stars 109 forks source link

Entropy Based Evaluation of unlabelled sections #365

Open davies-w opened 8 months ago

davies-w commented 8 months ago

Dear MIR community,

I've been revisiting the reference Professor McFee gave in https://github.com/craffel/mir_eval/issues/363#issuecomment-1715848028, and it's still not clear that either Onset or Hit Rate are the right metric for our problem (which is N unlabeled estimated boundaries vs M human boundaries), primarily around the lack of a smoothness function (due to the tolerance factor).

I've actually just started using Entropy measure to help fit estimated boundaries to 16-bar boundaries, and realized that this approach might be useful for measuring the estimated boundaries to reference boundaries as well. I'm not very mathematically equipped, but the algorithm would go something like this:

Each boundary contains beat aligned intervals. The Entropy H(p,n) = p/p+n log p/p+n) is calculated by taking the first estimated boundary, making the intervals inside that "p", and all the rest "n". We then project those into the reference boundary, and compute the Entropy. EG 0 0 0 0 1 1 1 1 2 2 2 2, vs 0 0 0 0 0 0 1 1 1 1 1 1, would give H(4, 6) + H(0, 6), + H(2, 6) + H(2,6) + H(0, 6) + H(4, 6). However, we'd want both directions, and so would also have H(4,4) + H(2,4) + H(0,4) + H(0,4) + H(2,4) + H(4,4), and we'd combine them using some aggregate (sum, min, average).

I don't want to reinvent the wheel here, so if anyone has any ideas why this is a bad idea or if it's been done before, I'd love feedback. To reiterate, the goal is an objective function measuring closeness of sectioning, not the precision/recall of hit/not a hit. In the tolerance world, the precision and recall of the above section is zero (IIUC).