Closed flothesof closed 2 years ago
Hi, it's me again, still in chapter 4. Sorry to write this here again, but it's unrelated to my previous issue.
The code below works well but I'm a bit surprised by the naming choice for the variable t
. Wouldn't it have been better to call it sample
? It was a little bit confusing to me when I first saw it. Also, you use the word "sample" later on to describe the list of values you're using. This might be worth considering (if you have the time).
def EvalCdf(t, x):
count = 0.0
for value in t:
if value <= x:
count += 1
prob = count / len(t)
return prob
Thanks!
Another comment:
Figure 5.8 shows normal probability plots for adult weights, w, and for their
logarithms, log10 w. Now it is apparent that the data deviate substantially
from the normal model. The lognormal model is a good match for the data
within a few standard deviations of the mean, but it deviates in the tails. I
conclude that the lognormal distribution is a good model for this data.
Isn't this supposed to be
The normal model is a good match for the data within a few standard deviations of the mean, but it deviates in the tails. ?
Thank you for all of these. It will take me a while to process them, but I will get to it soon!
Allen
On Thu, Feb 26, 2015 at 5:21 AM, flothesof notifications@github.com wrote:
Another comment:
Figure 5.8 shows normal probability plots for adult weights, w, and for their logarithms, log10 w. Now it is apparent that the data deviate substantially from the normal model. The lognormal model is a good match for the data within a few standard deviations of the mean, but it deviates in the tails. I conclude that the lognormal distribution is a good model for this data.
Isn't this supposed to be
The normal model is a good match for the data within a few standard deviations of the mean, but it deviates in the tails. ?
— Reply to this email directly or view it on GitHub https://github.com/AllenDowney/ThinkStats2/issues/16#issuecomment-76154139 .
Another small comment about the normal / lognormal modes. Figure 5.7 has the following caption:
CDF of adult weights on a linear scale (left) and log scale (right).
One thing that doesn't appear clearly in this caption is the fact that on the left the model is a normal one, while on the right it's a lognormal one. So I would suggest modifying both the labels within the figure ("normal model" and "lognormal model") and change the caption to:
CDF of adult weights on a linear scale, fitted using a normal model (left) and log scale, fitted using a lognormal model (right).
Thanks!
Another one: (I'm using your PDF version 2.0.23) in Chapter 6 it reads
>>> sample = [random.gauss(mean, std) for i in range(500)]
>>> sample_pdf = thinkstats2.EstimatedPdf(sample)
>>> thinkplot.Pdf(pdf, label='sample KDE')
I believe this should be
>>> thinkplot.Pdf(sample_pdf, label='sample KDE')
Small typo here:
If you are not familiar with moment of inertia, see
\url{http://en.wikipedia.org/wiki/Moment_of_inertia}. \index{moment
of inertia}.
There's a dot that shouldn't be there after the \index (this dot shows up in the PDF document).
Also I was surprised by this:
def Median(xs):
cdf = thinkstats2.MakeCdfFromList(xs)
return cdf.Value(0.5)
Why don't we use just thinkstats2.Cdf(xs) instead? This is the way we were "taught" to create CDFs so far in the book, so why use this other, unintroduced function there?
In the solutions to the exercice of chapter 6:
With a higher upper bound, the moment-based skewness increases, as
expected. Surprisingly, the Person skewness goes down! The reason
seems to be that increasing the upper bound has a modest effect on the
mean, and a stronger effect on standard deviation. Since std is in
the denominator with exponent 3, it has a stronger effect on the
result.
The comment about std being in the denominator with exponent 3 is incorrect, isn't it? It's exponent 1!
I've processed these and made corrections and changes. I'd like to add you to the contributor list. Should I use your github login, or do you want to email me your IRL name?
About skewness, the std does appear in the sample skewness with exponent 3. See http://en.wikipedia.org/wiki/Skewness#Sample_skewness
Addition to my previous comment: the reason I was saying the exponent is 1 is that the sentence you wrote in the solution file is about Pearson's measure of skewness, not the sample skewness (if you'd been talking about the sample skewness, the comment would have been correct, obviously). Therefore I'd suggest the following rephrase: (also, there was a typo on "Pearson")
With a higher upper bound, the moment-based skewness increases, as
expected.
Surprisingly, the Pearson skewness goes down! The reason
seems to be that increasing the upper bound has a modest effect on the
mean, and a stronger effect on standard deviation, which is in
the denominator, and thus has a stronger effect on the
result.
Chapter 7, scatter plots: your default code for scatter plots includes the following options
options = _Underride(options, color='blue', alpha=0.2,
s=30, edgecolors='none')
Therefore, the code which you say yields Figure 7.1, thinkplot.Scatter(heights, weights)
, does not permit to obtain that figure, due to transparency, which is a little misleading.
However, it's nice to have transparency by default, so I guess it would be more helpful to say that the code is thinkplot.Scatter(heights, weights, alpha=1)
? But then you need to explain what alpha does...
Docstring for HexBin: shouldn't that be "makes a hexbin plot"?
def HexBin(xs, ys, **options):
"""Makes a scatter plot.
...
Hi Allen,
small typo: you have an unnecessary parenthesis at the end of the following line of code found in section 7.7 (Spearman's correlation)
thinkstats2.Corr(df.htm3, np.log(df.wtkg2)))
It should be:
thinkstats2.Corr(df.htm3, np.log(df.wtkg2))
Thanks again. I will get to all of these soon!
On Tue, Mar 10, 2015 at 8:26 AM, flothesof notifications@github.com wrote:
Hi Allen,
small typo: you have an unnecessary parenthesis at the end of the following line of code found in section 7.7 (Spearman's correlation)
thinkstats2.Corr(df.htm3, np.log(df.wtkg2)))
It should be:
thinkstats2.Corr(df.htm3, np.log(df.wtkg2))
— Reply to this email directly or view it on GitHub https://github.com/AllenDowney/ThinkStats2/issues/16#issuecomment-78043326 .
Hi Allen,
I just finished the exercises for chapter 8 and have a couple of remarks regarding Exercise 8.3 (hocker / soccer games).
The problem statement is:
Is this way of making an estimate biased? Plot the sampling
distribution of the estimates and the 90\% confidence interval. What
is the standard error? What happens to sampling error for increasing
values of {\tt lam}?
Your solution does not address the confidence interval. This is actually a good point: when I computed the confidence interval, I realized it is quite meaningless in this context. In one of my tests, I set lambda=0.3 and got a confidence interval of [0; 1]. Which is to say that we always expect either 0 or 1 goals per match. As you're asking for those in the problem statement, maybe you could just point to the fact that in this context, the confidence interval is not useful (you probably have a better way of expressing this...)?
My second point pertains to the second question. Did you really mean to ask what happens when lam
increases? As far as I could tell, nothing. Judging from your solution, you probably meant the variable m
(the number of games).
As always, thanks for writing this book! :)
Hi Allen,
small typo (line 6949 of the TeX source):
statistically significant. But considering the two tests togther, I
Hi Allen,
I've just gone through the exercices of chapter 9 and I have a couple of thoughts:
Other than that, great chapter. Thanks!
Changes in chapter 4 as of 3b598ed
I think I have finally processed all of these. Thank you!
Hi Allen,
Thanks for your book, it's great. I executed the code in section 4.2, which reads as follows in your source TeX code:
When I executed this
I get an error due to the fact that
index
is not an integer, but a floating point value. I suggest using a cast to int as inI guess this solution still needs checking for appropriate rounding...