AllenDowney / ThinkStats2

Text and supporting code for Think Stats, 2nd Edition
http://allendowney.github.io/ThinkStats2/
GNU General Public License v3.0
4k stars 11.22k forks source link

code in book does not seem to work #16

Closed flothesof closed 2 years ago

flothesof commented 9 years ago

Hi Allen,

Thanks for your book, it's great. I executed the code in section 4.2, which reads as follows in your source TeX code:

\begin{verbatim}
def Percentile2(scores, percentile_rank):
    scores.sort()
    index = percentile_rank * (len(scores)-1) / 100
    return scores[index]
\end{verbatim}

When I executed this

scores = [55, 66, 77, 88, 99]
Percentile2(scores, 50.)

I get an error due to the fact that index is not an integer, but a floating point value. I suggest using a cast to int as in

def Percentile2(scores, percentile_rank):
    scores.sort()
    index = int(percentile_rank * (len(scores)-1) / 100)
    return scores[index]

I guess this solution still needs checking for appropriate rounding...

flothesof commented 9 years ago

Hi, it's me again, still in chapter 4. Sorry to write this here again, but it's unrelated to my previous issue.

The code below works well but I'm a bit surprised by the naming choice for the variable t. Wouldn't it have been better to call it sample? It was a little bit confusing to me when I first saw it. Also, you use the word "sample" later on to describe the list of values you're using. This might be worth considering (if you have the time).

def EvalCdf(t, x):
   count = 0.0
   for value in t:
      if value <= x:
          count += 1
   prob = count / len(t)
   return prob

Thanks!

flothesof commented 9 years ago

Another comment:

Figure 5.8 shows normal probability plots for adult weights, w, and for their
logarithms, log10 w. Now it is apparent that the data deviate substantially
from the normal model. The lognormal model is a good match for the data
within a few standard deviations of the mean, but it deviates in the tails. I
conclude that the lognormal distribution is a good model for this data.

Isn't this supposed to be

The normal model is a good match for the data within a few standard deviations of the mean, but it deviates in the tails. ?

AllenDowney commented 9 years ago

Thank you for all of these. It will take me a while to process them, but I will get to it soon!

Allen

On Thu, Feb 26, 2015 at 5:21 AM, flothesof notifications@github.com wrote:

Another comment:

Figure 5.8 shows normal probability plots for adult weights, w, and for their logarithms, log10 w. Now it is apparent that the data deviate substantially from the normal model. The lognormal model is a good match for the data within a few standard deviations of the mean, but it deviates in the tails. I conclude that the lognormal distribution is a good model for this data.

Isn't this supposed to be

The normal model is a good match for the data within a few standard deviations of the mean, but it deviates in the tails. ?

— Reply to this email directly or view it on GitHub https://github.com/AllenDowney/ThinkStats2/issues/16#issuecomment-76154139 .

flothesof commented 9 years ago

Another small comment about the normal / lognormal modes. Figure 5.7 has the following caption:

CDF of adult weights on a linear scale (left) and log scale (right).

One thing that doesn't appear clearly in this caption is the fact that on the left the model is a normal one, while on the right it's a lognormal one. So I would suggest modifying both the labels within the figure ("normal model" and "lognormal model") and change the caption to:

CDF of adult weights on a linear scale, fitted using a normal model (left) and log scale, fitted using a lognormal model (right).

Thanks!

flothesof commented 9 years ago

Another one: (I'm using your PDF version 2.0.23) in Chapter 6 it reads

>>> sample = [random.gauss(mean, std) for i in range(500)]
>>> sample_pdf = thinkstats2.EstimatedPdf(sample)
>>> thinkplot.Pdf(pdf, label='sample KDE')

I believe this should be

>>> thinkplot.Pdf(sample_pdf, label='sample KDE')
flothesof commented 9 years ago

Small typo here:

If you are not familiar with moment of inertia, see
\url{http://en.wikipedia.org/wiki/Moment_of_inertia}.  \index{moment
  of inertia}.

There's a dot that shouldn't be there after the \index (this dot shows up in the PDF document).

flothesof commented 9 years ago

Also I was surprised by this:

def Median(xs):
   cdf = thinkstats2.MakeCdfFromList(xs)
   return cdf.Value(0.5)

Why don't we use just thinkstats2.Cdf(xs) instead? This is the way we were "taught" to create CDFs so far in the book, so why use this other, unintroduced function there?

flothesof commented 9 years ago

In the solutions to the exercice of chapter 6:

With a higher upper bound, the moment-based skewness increases, as
expected.  Surprisingly, the Person skewness goes down!  The reason
seems to be that increasing the upper bound has a modest effect on the
mean, and a stronger effect on standard deviation.  Since std is in
the denominator with exponent 3, it has a stronger effect on the
result.

The comment about std being in the denominator with exponent 3 is incorrect, isn't it? It's exponent 1!

AllenDowney commented 9 years ago

I've processed these and made corrections and changes. I'd like to add you to the contributor list. Should I use your github login, or do you want to email me your IRL name?

About skewness, the std does appear in the sample skewness with exponent 3. See http://en.wikipedia.org/wiki/Skewness#Sample_skewness

flothesof commented 9 years ago

Addition to my previous comment: the reason I was saying the exponent is 1 is that the sentence you wrote in the solution file is about Pearson's measure of skewness, not the sample skewness (if you'd been talking about the sample skewness, the comment would have been correct, obviously). Therefore I'd suggest the following rephrase: (also, there was a typo on "Pearson")

With a higher upper bound, the moment-based skewness increases, as
expected.  

Surprisingly, the Pearson skewness goes down!  The reason
seems to be that increasing the upper bound has a modest effect on the
mean, and a stronger effect on standard deviation, which is in
the denominator, and thus has a stronger effect on the
result.
flothesof commented 9 years ago

Chapter 7, scatter plots: your default code for scatter plots includes the following options

options = _Underride(options, color='blue', alpha=0.2, 
                        s=30, edgecolors='none')

Therefore, the code which you say yields Figure 7.1, thinkplot.Scatter(heights, weights), does not permit to obtain that figure, due to transparency, which is a little misleading.

However, it's nice to have transparency by default, so I guess it would be more helpful to say that the code is thinkplot.Scatter(heights, weights, alpha=1)? But then you need to explain what alpha does...

flothesof commented 9 years ago

Docstring for HexBin: shouldn't that be "makes a hexbin plot"?

def HexBin(xs, ys, **options):
    """Makes a scatter plot.
...
flothesof commented 9 years ago

Hi Allen,

small typo: you have an unnecessary parenthesis at the end of the following line of code found in section 7.7 (Spearman's correlation)

thinkstats2.Corr(df.htm3, np.log(df.wtkg2)))

It should be:

thinkstats2.Corr(df.htm3, np.log(df.wtkg2))
AllenDowney commented 9 years ago

Thanks again. I will get to all of these soon!

On Tue, Mar 10, 2015 at 8:26 AM, flothesof notifications@github.com wrote:

Hi Allen,

small typo: you have an unnecessary parenthesis at the end of the following line of code found in section 7.7 (Spearman's correlation)

thinkstats2.Corr(df.htm3, np.log(df.wtkg2)))

It should be:

thinkstats2.Corr(df.htm3, np.log(df.wtkg2))

— Reply to this email directly or view it on GitHub https://github.com/AllenDowney/ThinkStats2/issues/16#issuecomment-78043326 .

flothesof commented 9 years ago

Hi Allen,

I just finished the exercises for chapter 8 and have a couple of remarks regarding Exercise 8.3 (hocker / soccer games).

The problem statement is:

Is this way of making an estimate biased?  Plot the sampling
distribution of the estimates and the 90\% confidence interval.  What
is the standard error?  What happens to sampling error for increasing
values of {\tt lam}?

Your solution does not address the confidence interval. This is actually a good point: when I computed the confidence interval, I realized it is quite meaningless in this context. In one of my tests, I set lambda=0.3 and got a confidence interval of [0; 1]. Which is to say that we always expect either 0 or 1 goals per match. As you're asking for those in the problem statement, maybe you could just point to the fact that in this context, the confidence interval is not useful (you probably have a better way of expressing this...)?

My second point pertains to the second question. Did you really mean to ask what happens when lam increases? As far as I could tell, nothing. Judging from your solution, you probably meant the variable m (the number of games).

As always, thanks for writing this book! :)

flothesof commented 9 years ago

Hi Allen,

small typo (line 6949 of the TeX source):

statistically significant. But considering the two tests togther, I

flothesof commented 9 years ago

Hi Allen,

I've just gone through the exercices of chapter 9 and I have a couple of thoughts:

Other than that, great chapter. Thanks!

AllenDowney commented 2 years ago

Changes in chapter 4 as of 3b598ed

AllenDowney commented 2 years ago

I think I have finally processed all of these. Thank you!