Closed LFISHER7 closed 11 months ago
Thanks for this @LFISHER7. Could you point to some examples of the bug? It seems odd that some percentiles were repeated; the context would help me understand your proposed fix.
Sorry for the delay in responding to this! I've introduced a failing test to highlight the issue and suggested a fix
Taking a look at this, I think it results from floating point precision with np.arrange
i.e.
>>> np.arange(0.1, 1.0, 0.1)[6]
0.7000000000000001
The resulting percentiles are
>>> df["percentile"][14]
0.060000000000000005
>>> df["percentile"][15]
0.06999999999999999
and truncated to ints, they are both 6
>>> int(first*100)
6
>>> int(second*100)
6
if round had been used instead,
>>> round(100*first)
6.0
>>> round(100*second)
7.0
Another solution could be to do the rounding when creating the arrays
deciles = (np.arange(1, 10) / 10)
bottom_percentiles = (np.arange(1, 10) /100)
top_percentiles = (np.arange(91, 100) /100)
And maybe the most readable/understandable solution would be to avoid numpy.arrange deciles = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] bottom_percentiles =[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09] top_percentiles = [0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]
Closing in favour of #37.
OpenSAFELY SRO work uses the deciles chart reusable action, which in turn uses the decile chart function here.
When showing the outer percentiles, I was hitting a bug whereby some percentiles were repeated. This is a proposed fix, but there may be better ways to handle this.
This also refactors the function to compute deciles in passing, in line with previous SRO work by @iaindillingham.