avatarzhang / python-statlib

Automatically exported from code.google.com/p/python-statlib
Other
0 stars 0 forks source link

median() broken #1

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I believe the median() function is completely hosed.

>>> xs=[263.02819001674698, 524.84250500798203, 0.12129300832748401, 
7.6110050082206699, 68.885652005672497, 0.040983021259307903, 
0.019165009260177598, 0.020912021398544301, 0.113341003656387, 
0.89126500487327598, 0.106117993593216, 0.058337002992629998, 
0.080624014139175401, 0.052327990531921401, 0.080542981624603299, 
0.056782990694046, 0.019524991512298601, 0.464242994785309, 
1187.8612969815699, 0.0811200141906738, 584.71863499283802, 
564.39484900236096, 1038.5128569901001, 0.300267994403839, 
0.097283005714416504, 0.256449013948441, 0.71837198734283403, 
555.23849099874496, 0.033033996820449801, 0.026777982711791999, 
0.51870399713516202, 0.085212975740432698, 0.103229999542236, 
0.066174000501632704, 0.065015017986297594, 0.13031598925590501, 
0.075087994337081895, 0.047565996646881097, 0.18698698282241799, 
0.46866002678871199, 0.10801398754119899, 0.20655500888824499, 
0.13292402029037501, 0.0135589838027954, 0.221956998109818, 
0.028318017721176099, 27.163345992565201, 0.445084989070892, 
7.3120360076427504, 0.402972012758255, 196.204346984625, 
0.014095991849899301, 0.18600699305534399, 0.077917993068695096, 
0.51648098230361905, 0.45996898412704501, 112.70363900065399, 
0.149616003036499, 587.66046202182804, 0.182321012020111, 
2265.7604490220501, 3.3876670002937299, 0.066388010978698703, 
0.070600986480712905, 0.57084199786186196, 0.0147069990634918, 
0.023294001817703198, 0.0612359941005707, 6.6273670196533203, 
44.348473995924003, 171.04825401306201, 0.74552002549171403, 
0.067524999380111694, 0.072905004024505601, 49.498228996992097, 
10.317258000373799, 6.9318929910659799, 1668.05913001299, 
0.042795985937118503, 0.089037001132965102, 0.094287991523742704, 
0.082978010177612305, 0.092319995164871202, 0.30596598982811002, 
1.8339200019836399]
>>> import statlib.stats
>>> statlib.stats.median(xs)
-1.5030296603357087
>>> sorted(xs)[len(xs)/2]
0.182321012020111

Original issue reported on code.google.com by yaa...@gmail.com on 28 Jan 2008 at 7:09

GoogleCodeExporter commented 8 years ago
The way this works may be slightly counterintuitive, as the usual definition of 
the
median (the value at the midpoint that you mention) is not the same as the 
precise
statistical definition. 

In the stats package the medianscore() function will produce the output that 
you are
after:

>>> print stats.medianscore( xs )
0.18232101202

Now the median() function takes another parameter, numbins that, if set to
sufficiently high will approximate the medianscore() above:

>>> print stats.median( xs, numbins=100000 )
0.175937510922

As per the docstring:

"""
    Returns the computed median value of a list of numbers, given the
    number of bins to use for the histogram (more bins brings the computed value
    closer to the median score, default number of bins = 1000).  See G.W.
    Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics.

    Usage:   lmedian (inlist, numbins=1000)
"""

Original comment by istvan.a...@gmail.com on 28 Jan 2008 at 1:55