craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
MIT License
607 stars 114 forks source link

Merge common functionality across submodules into util #29

Closed craffel closed 10 years ago

craffel commented 10 years ago

Need to move code for computing false positives/negatives of event times over a window into util. It's used in both beat and onset in the respective f_measure functions.. Is this the same as boundary.detection, @bmcfee?

bmcfee commented 10 years ago

Tough question: are the semantics of finding a boundary the same as those of detecting an onset/beat? boundary.detection now enforces that each estimated/reference boundary be used at most once in the precision/recall calculation, and I'm not sure if that's what you want for onsets.

If so, we can abstract out the guts of the boundary metric into a function util.window_matching, which computes the size of the maximal matching between reference and estimates within a specified window.

craffel commented 10 years ago

Onsets/beats only allow each onset/beat to be counted once. The difference is that it's done greedily, I think, but I'm not 100% if that matters. We should discuss in person.

bmcfee commented 10 years ago

Ah. I suspect it doesn't matter too much for beats (as in segments), but it might be a big deal for onsets, especially if there are rapid sequences of onsets (eg in an arpeggio).

craffel commented 10 years ago

Can you walk through a degenerate case where it would matter? It seems like if you're just counting true/false positive/negatives it won't matter. But it would obviously matter if you care about distance to closest event.

craffel commented 10 years ago

We're going to fix on the non-greedy approach for all submodules.

bmcfee commented 10 years ago

Just to document the discussion from earlier: the greedy matching strategy can fail when the gap between two events is smaller than twice the window length. In these cases, it's possible to have an estimate which could feasibly map to either reference event, and mapping greedily to the closest one can result in a smaller global mapping

This probably does not affect boundary detection at 0.5 seconds. It might affect boundary detection at 3 seconds (<=6sec segments are not uncommon).

Beat tracking is probably safe, given the relatively small window length, but a degenerate beat tracker might output many events which cause trouble anyway. (Although, in this case, the score would probably be so low as to not matter if we have erroneous assignments...)

Onset detection is probably the most troublesome case, given the relatively lax constraints over the timing between subsequent events (compared to beats or segments, at least).

dpwe commented 10 years ago

I can't help wanting to know what the "non-greedy approach" is. Greedy is usually used as a heuristic when the optimal solution (i.e., the correspondence that minimizes total error) requires an exponential search, and no optimal polynomial algorithm is known. I don't suppose we're doing the exponential search. So what are we doing?

DAn.

On Thu, Apr 17, 2014 at 7:14 PM, craffel notifications@github.com wrote:

We're going to fix on the non-greedy approach for all submodules.

— Reply to this email directly or view it on GitHub.

craffel commented 10 years ago

It's here: https://github.com/craffel/mir_eval/blob/master/mir_eval/boundary.py#L114 Greedy algorithms are also done when the non-greedy algorithms are non-obvious, even when they're non-exponential.

dpwe commented 10 years ago

Or, put succinctly: "We find the answer using magic"

On Thu, Apr 17, 2014 at 10:39 PM, craffel notifications@github.com wrote:

It's here: https://github.com/craffel/mir_eval/blob/master/mir_eval/boundary.py#L114 Greedy algorithms are also done when the non-greedy algorithms are non-obvious, even when they're non-exponential.

— Reply to this email directly or view it on GitHub.

dpwe commented 10 years ago

I assume this tests out correctly?

# L. Lovasz On determinants, matchings and random algorithms.
# In L. Budach, editor, Fundamentals of Computation Theory, pages

565-574. Akademie-Verlag, 1979. #

If we build the skew-symmetric adjacency matrix

# D[i, n_ref+j] = 1 <=> ref[i] within window of est[j]
# D[n_ref + j, i] = -1 <=> same
#
# then rank(D) = 2 \* maximum matching
#
# This way, we find the optimal assignment of reference and

annotation boundaries.

I'm trying to figure this out...

If we have really broad matching windows, so that all of ref[i] are within the windows of all of est[i], then we end up with a block-constant square matrix:

n_r rows [ 0 | 1] n_e rows [ -1 | 0]

That looks like rank 2 to me, so max matching = 1, so the algorithm returns precision and recall = 1/n_boundaries

yet I think if n_r == n_e, you can make a complete assignment, so the best-case precision and recall should be 1.

If my interpretation is right, this algorithm is more like a worst-case matching, that penalizes ambiguous assignments, whereas I think the point of the metric is to find the best-case assignment.

The quick way to check is to see how the metrics vary as the window increases. The scores should go up, but I'm guessing it will go down.

DAn.

On Thu, Apr 17, 2014 at 10:39 PM, craffel notifications@github.com wrote:

It's here: https://github.com/craffel/mir_eval/blob/master/mir_eval/boundary.py#L114 Greedy algorithms are also done when the non-greedy algorithms are non-obvious, even when they're non-exponential.

— Reply to this email directly or view it on GitHub.

bmcfee commented 10 years ago

can't help wanting to know what the "non-greedy approach" is. Greedy is usually used as a heuristic when the optimal solution (i.e., the correspondence that minimizes total error) requires an exponential search, and no optimal polynomial algorithm is known. I don't suppose we're doing the exponential search. So what are we doing?

Ok, let's formalize the problem. Given a set of estimated boundaries P and a set of reference boundaries R, we want to find the largest matching between P and R subject to the window constraint. This is an instance of maximal matching in a bipartite graph. Let the vertices V = P + R and add edges (i, j) <=> |P[i] - R[j] <= window. Then the goal is to find the largest subset of edges such that each vertex is contained in at most one edge. The size of this set (the matching) is M, the precision is M / |P|, and recall is M / |R|.

This problem is well-studied, and there exist several polynomial time algorithms to solve it. See: http://en.wikipedia.org/wiki/Matching_(graph_theory)#In_unweighted_bipartite_graphs

bmcfee commented 10 years ago

That looks like rank 2 to me, so max matching = 1, so the algorithm returns precision and recall = 1/n_boundaries

Yup, you're right. Looks like I misinterpreted the paper I pulled this idea from!

However, the maximum matching idea is still correct, and we can plug in a hopcroft-karp solver to compute it correctly.

dpwe commented 10 years ago

OK, so we need a maximum bipartite matching, and the Augmenting Path Algorithm appears to be the conceptually simplest (least efficient) polynomial algorithm. Anyone have a copy of West, Douglas Brent (1999), Introduction to Graph Theory (2nd ed.) to hand?

DAn.

On Fri, Apr 18, 2014 at 9:15 AM, Brian McFee notifications@github.com wrote:

can't help wanting to know what the "non-greedy approach" is. Greedy is usually used as a heuristic when the optimal solution (i.e., the correspondence that minimizes total error) requires an exponential search, and no optimal polynomial algorithm is known. I don't suppose we're doing the exponential search. So what are we doing?

Ok, let's formalize the problem. Given a set of estimated boundaries P and a set of reference boundaries R, we want to find the largest matching between P and R subject to the window constraint. This is an instance of maximal matching in a bipartite graph. Let the vertices V = P + R and add edges (i, j) <=> |P[i] - R[j] <= window. Then the goal is to find the largest subset of edges such that each vertex is contained in at most one edge. The size of this set (the matching) is M, the precision is M / |P|, and recall is M / |R|.

This problem is well-studied, and there exist several polynomial time algorithms to solve it. See: http://en.wikipedia.org/wiki/Matching_(graph_theory)#In_unweighted_bipartite_graphs

— Reply to this email directly or view it on GitHub.

bmcfee commented 10 years ago

Hopcroft-Karp is now implemented in mir_eval.util as of d7df3585734f7d0326d5f854d8f16fe3b3d06373. The boundary detection metric has been rewritten to use it, and it now does The Right Thing.

craffel commented 10 years ago

@ejhumphrey Currently intervals are validated in the chord.score decorator by

        # Intervals should be (n, 2) array
        if intervals.ndim != 2 or intervals.shape[1] != 2:
            raise ValueError('intervals should be an ndarray'
                            ' of size (n, 2)')
        # There should be as many intervals as labels
        if intervals.shape[0] != N:
            raise ValueError('intervals contains {} entries but '
                            'len(reference_labels) = len(estimated_labels)'
                            ' = {}'.format(intervals.shape[0], N))
        if 0 in np.diff(np.array(intervals), axis=1):
            warnings.warn('Zero-duration interval')

The first and last check are included in util.validate_intervals: https://github.com/craffel/mir_eval/blob/master/mir_eval/util.py#L560 except that instead of checking for empty intervals, it checks for any intervals of length less than or equal to zero. It also checks for negative interval times which seems helpful. Can you think of any reason I shouldn't replace the first and last check with a call to util.validate_intervals?

ejhumphrey commented 10 years ago

at the moment no, but I'm shutting down for the day ... lemme pin this to the top of my todo list for tomorrow and give it more brain power then.

On Tue, Jul 22, 2014 at 7:35 PM, craffel notifications@github.com wrote:

@ejhumphrey https://github.com/ejhumphrey Currently intervals are validated in the chord.score decorator by

    # Intervals should be (n, 2) array
    if intervals.ndim != 2 or intervals.shape[1] != 2:
        raise ValueError('intervals should be an ndarray'
                        ' of size (n, 2)')
    # There should be as many intervals as labels
    if intervals.shape[0] != N:
        raise ValueError('intervals contains {} entries but '
                        'len(reference_labels) = len(estimated_labels)'
                        ' = {}'.format(intervals.shape[0], N))
    if 0 in np.diff(np.array(intervals), axis=1):
        warnings.warn('Zero-duration interval')

The first and last check are included in util.validate_intervals: https://github.com/craffel/mir_eval/blob/master/mir_eval/util.py#L560 except that instead of checking for empty intervals, it checks for any intervals of length less than or equal to zero. It also checks for negative interval times which seems helpful. Can you think of any reason I shouldn't replace the first and last check with a call to util.validate_intervals?

— Reply to this email directly or view it on GitHub https://github.com/craffel/mir_eval/issues/29#issuecomment-49815647.