craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
MIT License
603 stars 112 forks source link

resample_multipitch improvement #336

Open cwitkowitz opened 3 years ago

cwitkowitz commented 3 years ago

Using the current functionality, if I try to resample the following pitch list,

image

which has no empty pitch observations (taken directly from a JAMS file), I get the following:

image

With the changes, specifying a tolerance of 50 ms, I get the following:

image

A good rule of thumb would be setting the tolerance (if you want to use it) to twice the minimum timing difference in your pitch list.

craffel commented 3 years ago

@rabitt can you take a look?

rabitt commented 3 years ago

@cwitkowitz good catch on this bug! I think the bug boils down to a false assumption the function makes - that the time series is uniformly spaced / time stamps with no pitches are reported with an empty frequency list.

@craffel meta-question - do we want to force multipitch time series to be uniform like we do for melody? @cwitkowitz's solution is more general and would allow non-uniform time scales, but the properties of the metrics are really different if we do/don't report silent frames.

craffel commented 3 years ago

The high-level rule is to try to stick to external conventions, i.e. if in all past datasets/papers/competitions the multipitch time series are uniform, then we can assume that. But if there is existing convention that time series can be non-uniform, we should support that. Either way, we should document the convention. Does that make sense in this case?

rabitt commented 3 years ago

The high-level rule is to try to stick to external conventions, i.e. if in all past datasets/papers/competitions the multipitch time series are uniform, then we can assume that. But if there is existing convention that time series can be non-uniform, we should support that. Either way, we should document the convention. Does that make sense in this case?

I've never seen non-uniform time series for multipitch. and the mirex data from when I implemented this were uniform. Paper-wise, it's one of those details we all tend to leave out. All that said, I'd lean strongly towards uniform.

craffel commented 3 years ago

Ok, in that case, if the code is broken for non-uniform timescales, we should throw an exception when a non-uniform timescale is passed in.

cwitkowitz commented 3 years ago

Here is some information on my use-case. I am using GuitarSet, which has note and pitch annotations in JAMS. When evaluating MF0E performance, I would like to use the pitch annotations, as I believe they are more correct than converting the notes into discrete pitch activity during discrete times.

The pitch annotations are only specified for positive samples (i.e. times where a pitch is active). If I want to use that data with mir_eval, it needs to be converted into uniform sampling first. I thought this was a good place to put that conversion, however, I am open to other suggestions. Maybe the conversion to uniform is appropriate as its own function or somewhere in the JAMS repo.

craffel commented 3 years ago

We could add a separate utility function that converts non-uniform time series to uniform.

cwitkowitz commented 3 years ago

@rabitt / @craffel How does the following sound for a solution? Scrap what I have so far and replace it with a warning similar to that in the melody resampling function. This warns the user if the multipitch times are not uniform.

Then, I can create a utility function for making time-series uniform. In this function, I can estimate the true spacing by ignoring outliers and taking the mean of the differentials (something to that effect). Then I can re-project the original data onto the new collection of times, which would start at 0 and be spaced by the estimated spacing.

Still, this would not always handle sporadic data very well, e.g. if my positive samples are not spaced uniformly or have no consistent sampling in the first place. However, in the case of JAMS data, where pitch contours should be aligned with some sampling period, I believe it would work just fine.

It might also make sense to put the utility function in the JAMS repo, since JAMS pitch_contour seems to be the only case where this has ever been encountered.

craffel commented 3 years ago

This seems like a reasonable solution to me at a high level.

cwitkowitz commented 3 years ago

This seems like a reasonable solution to me at a high level.

Great. I can get started on this then.

rabitt commented 3 years ago

@cwitkowitz If you haven't gotten too far - I have a old script that does exactly this (filling in missing time stamps) if you want to use it as a starting point.

cwitkowitz commented 3 years ago

@rabitt - The script you linked would certainly work, however it requires that the sampling rate and hop length are known. In my case these values are not explicitly defined.

I went a little down the path of implementing what I mentioned above, i.e. estimating the hop length by taking the mean of first-order differentials wherever the second-order differential was close to zero. I wasn't sure what to do at that point. I can step from times[0] to times[-1] using this estimated hop length, but it doesn't necessarily line up perfectly, and it seems we would need 1d interpolation again. There is also no knowledge of the duration, so we can fill in empties from t=0 until times[0], but we cannot fill in empties from times[-1] to t=duration.

Any thoughts?

rabitt commented 3 years ago

@cwitkowitz

I went a little down the path of implementing what I mentioned above, i.e. estimating the hop length by taking the mean of first-order differentials wherever the second-order differential was close to zero. I wasn't sure what to do at that point. I can step from times[0] to times[-1] using this estimated hop length, but it doesn't necessarily line up perfectly, and it seems we would need 1d interpolation again. There is also no knowledge of the duration, so we can fill in empties from t=0 until times[0], but we cannot fill in empties from times[-1] to t=duration.

Maybe it's safer to have hop_length and duration as arguments to the script - presumably that information should be part of the datasets the data comes from, and then there's no room for error. And maybe you could create a separate helper function called estimate_hop_length that returns a number that the user can use/not?

cwitkowitz commented 3 years ago

@rabitt - how does this look? I can add test cases once we are happy with the changes.

cwitkowitz commented 3 years ago

Good suggestion @rabitt

I made that change and added a few other small changes necessary to get the test cases working.

Here a a few final considerations before completing the PR:

  1. NumPy diff() function was not always giving me exact values, which is why I needed the A_TOL comparison rather than strict equality.
  2. The time_series_to_uniform() function works only for observations specified by a list of ndarrays. This is fine for my purposes, but I am not sure if we would like the function to be more general for mir_eval as a whole.
  3. In my test cases, I needed to compare frequencies, so I used repeat code from test_multipitch.py. I did not see any current examples where code within tests/ was being reused, so I wasn't sure if we wanted to start doing that or just ignore it for now.
cwitkowitz commented 2 years ago

@rabitt - I recently came back to this and ran into an error where sometimes the final (original) observation, while still occurring within the specified duration, extends in time beyond what we represent in the new times without including an extra frame that goes over the specified duration. I am still a little bit unsure about the best way to handle this issue. In order to reproduce the error, time_series_to_uniform() can simply be ran with the JAMS data from source 3 of the track 01_BN1-129-Eb_solo in GuitarSet.

My confusion led me to check the mirdata package, to see what was being done there for GuitarSet (see my use-case in the above comment). I then realized that nearly the exact same functionality introduced with this PR in time_series_to_uniform() is supported in _fill_pitch_contour() of the GuitarSet wrapper in the mirdata package.

I also noticed that in the load_pitch_contour() function of the wrapper, the duration provided in the JAMS data is not being passed into _fill_pitch_contour(), which means that potentially 1-2 frames of original observations could be discarded due to the check if time > max_time or t_idx >= n_stamps: in _fill_pitch_contour(). If I am grasping this correctly, one frame may get discarded because of the issue I described above, and another frame may get discarded because the new times won't extend past the final time in the original observations (as opposed to the true duration of the file, which may be larger than the maximum time).

Coming back to this with a fresh look also made me begin to question if we really have the best solution here (in this PR and in mirdata). We are essentially just shifting the times of all of the observations without modifying the values in any way. Wouldn't it be more correct to do some sort of interpolation? I guess that is the point of the resample_multipitch() function in mir_eval, which we can't immediately use here due to the non-uniform time series in the JAMS data (the whole point of this PR).

Sorry for the really long and detailed comment. I needed to write all of this out to organize my own thoughts :grin:.

Since some of this discussion involves mirdata, would it be better to bring any of this up in that repository? Also, would it make more sense to support this functionality within the JAMS package, so that it can be used in multiple different contexts?

cwitkowitz commented 2 years ago

I think the best solution here is simply to add an extra frame in all cases, even if this means the nearest frame on the uniform time grid for the last observation may occur slightly after the specified duration.