craffel / mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
MIT License
588 stars 109 forks source link

mir_eval.chord.sonify starts late/ends early #322

Closed rabitt closed 4 years ago

rabitt commented 4 years ago

When running the chord sonification code, I'm noticing that the first chord always starts late, and the last chord ends early. For example:

import mir_eval
import matplotlib.pyplot as plt

fs = 8000
sonification = mir_eval.sonify.chords(['E:min', 'A:7'], np.array([[0, 1], [1, 2]]), fs=fs)

plt.figure(figsize=(15, 7))
plt.plot(sonification)

Since the first chord starts at 0 second and the last one ends at 2 seconds, I'd expect to see audio for the entire duration of the clip, but I don't:

download

cc @bmcfee

craffel commented 4 years ago

I guess the main question is whether this happens in chord.encode_many or sonify.chroma: https://github.com/craffel/mir_eval/blob/master/mir_eval/sonify.py#L324 Any thoughts?

daturkel commented 4 years ago

The problem seems to be in sonify.time_frequency, particularly in this section for linearly transitioning between two chords:

    # Pre-allocate output signal
    output = np.zeros(length)
    time_centers = np.mean(times, axis=1) * float(fs)

    for n, frequency in enumerate(frequencies):
        # Get a waveform of length samples at this frequency
        wave = _fast_synthesize(frequency)

        # Interpolate the values in gram over the time grid
        if len(time_centers) > 1:
            gram_interpolator = interp1d(
                time_centers, gram[n, :],
                kind='linear', bounds_error=False,
                fill_value=0.0)
        # If only one time point, create constant interpolator
        else:
            gram_interpolator = _const_interpolator(gram[n, 0])

        # Scale each time interval by the piano roll magnitude
        for m, (start, end) in enumerate((times * fs).astype(int)):
            # Clip the timings to make sure the indices are valid
            start, end = max(start, 0), min(end, length)
            # add to waveform
            output[start:end] += (
                wave[start:end] * gram_interpolator(np.arange(start, end)))

Because the x values for scipy.interpolate.interp1d are the centers of each interval, then the period before the center of the first chord interval and the period after the center of the last chord interval are outside the range of values the function will interpolate within. Since fill_value is set to 0, points outside this range get set to 0

Scipy has a nice option for interp1d's fill_value that can deal with this:

If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.

Taking advantage of that, we can just tell the interpolator to treat anything left of the first center as gram[0,:] and anything right of the last center as gram[-1,:]:

        # Interpolate the values in gram over the time grid
        if len(time_centers) > 1:
            gram_interpolator = interp1d(
                time_centers, gram[n, :],
                kind='linear', bounds_error=False,
                fill_value=(gram[n,0],gram[n,-1]))

This maintains the current linear chord transition behavior (which is anchored at the centers of adjacent intervals) but doesn't miss the first half of the first interval and second half of the last interval.

I made the one-line change in a fork and can create a PR if this is a satisfactory solution.