astrofrog / fast-histogram

:zap: Fast 1D and 2D histogram functions in Python :zap:
BSD 2-Clause "Simplified" License
266 stars 28 forks source link

histogram1d returns incorrect bins due to rounding error #63

Open jokabrink opened 1 year ago

jokabrink commented 1 year ago

Hello,

the function histogram1d (and likely histogram2d) can increment the wrong bin when a floating-point rounding error happened. This happens at https://github.com/astrofrog/fast-histogram/blob/main/fast_histogram/_histogram_core.c#L173, when the value of ix is calculated just below a whole number due to rounding. The index lookup in the next line then truncates the value to a whole number resulting in an incorrectly incremented bin one below.

See here:

left, right = (0.9, 1.1)
bins = 10
norm = bins/(right-left)
value = 1.00
ix = (value-left)*norm;
print(ix, int(ix))  # 4.999999999999997 4

The incorrect behaviour of histogram1d can be reproduced with the following script:

#!/usr/bin/env python3

import numpy as np
from fast_histogram import histogram1d

data = np.array([0.999, 1.000, 1.001, 1.099, 1.100, 1.101])

bins, edges = np.histogram(data, bins=10, range=(0.9, 1.1))
fast_bins = histogram1d(data, bins=10, range=(0.9, 1.1))
print(data)
print(edges)
print("#### NumPy")
print(bins)  # [0 0 0 0 1 2 0 0 0 2]
print("#### Fast-Histogram")
print(fast_bins)  # [0. 0. 0. 0. 2. 1. 0. 0. 0. 1.]

Also this includes another error (miscounted right-most value), the incorrect result due to the rounding error is the in the middle of the bins ( (1,2) vs (2,1) ).