Edge Case, rounding error in length calculation

cyberso commented 2 years ago

When apply resampy.resample( data, original_sr, target_sr) if original_sr = 12499, target_sr = 15001, the returned output is of sampling rate 15000.

What cause the failure at this edge case?
Is there any recommended fix or by-pass solution?

To reproduce this error (with synethetic data of length 12499)

>>> a = [random.random() for i in range(12499)]
>>> b = resampy.resample(np.array(a),12499, 15001)
>>> np.shape(b)
(15000,)

bmcfee commented 2 years ago

What's happening here is due to the following lines which calculate the shape of the output array:

https://github.com/bmcfee/resampy/blob/2ca177f53f60ce39317cb35a548c8c69c011b63c/resampy/core.py#L106-L110

If you compute the shape directly in python, it reveals a floating point roundoff:

In [1]: 15001 / 12499
Out[1]: 1.2001760140811264

In [2]: 12499 * (15001 / 12499)
Out[2]: 15000.999999999998

which then gets clamped down to 15000 when we cast to int. This is one of those times where the non-associativity of floating point arithmetic can bite you.

Now, we could have used rounding here instead of a floor/integer cast, but that would have caused problems for other input configurations, extrapolating off the end of the input array rather than interpolating.

It's worth noting that behavior with this configuration is not consistent across different resampling engines. Both resampy and samplerate produce 15000 samples in this configuration, probably because they're based on the same calculation. soxr and scipy.signal produce 15001.

Probably your best bet here, if you're really worried about it, is to explicitly construct the time grid you want to sample, and use the resample_nu function.

I'm not sure there's much else we can do, but it might be possible that a different order of operations in the length calculation here could produce more consistent results. (I'm not sure that it is possible in general, but it's worth looking into.)

bmcfee commented 2 years ago

Note: I've updated the title here because the issue is the length, not the sampling rate (which is correctly computed).

bmcfee commented 2 years ago

Confirmed that changing the order of operations here does fix the calculation. It may in turn cause other problems, but I somewhat doubt it. The fix in #113 retains integer multiplication first (which should not produce fp errors) and then divides, so there is only one step that can induce round-off error instead of two.

bmcfee / resampy

Edge Case, rounding error in length calculation #111