AllenDowney / ThinkStats2

Text and supporting code for Think Stats, 2nd Edition
http://allendowney.github.io/ThinkStats2/
GNU General Public License v3.0
4.03k stars 11.31k forks source link

error in Percentile2 #22

Closed AllenDowney closed 2 years ago

AllenDowney commented 9 years ago

From Gary Foreman

I believe I have found an error in your Percentile2 function that you include in Chapter 4.2 of the second edition. Specifically, the Percentile function and the Percentile2 function do not produce consistent results.

In the context of the example from the book, where we have a list of scores [55, 66, 77, 88, 99], let's consider the output of the two functions when we want to find the score with percentile rank 81. The Percentile function searches for the lowest score that corresponding to percentile rank 81. Score 88 has percentile rank 80, so the output will be 99, i.e. the next lowest score.

The Percentile2 function determines index as index = percentile_rank * (len(scores) -1) // 100 For the case of percentile_rank = 81, and scores = [55, 66, 77, 88, 99] index = 81 * (5 - 1) // 100 = 324 // 100 = 3 scores.sorted()[3] = 88.

I've coded this up and put it on gist, which can be found here https://gist.github.com/garyForeman/89ab99cd83ac47acd900

I've also created an alternative function called MyPercentile2, which behaves consistently with Percentile (at least for the test cases I have explored). Specifically, I assign index as index = int(math.ceil(percentile_rank * len(scores) / 100.)) - 1

Please feel free to contact me if you have any questions or comments about the code on gist. And again, I really appreciate all the work you've done to create this valuable resource.