AllenDowney / ThinkStats2

Text and supporting code for Think Stats, 2nd Edition
http://allendowney.github.io/ThinkStats2/
GNU General Public License v3.0
4.03k stars 11.31k forks source link

(Chapter 1.7 Validation) Result of value_counts slightly different than one in book #71

Closed an0o0nym closed 2 years ago

an0o0nym commented 7 years ago

Hi, I just run code provided in the chapter 1.7. While df.outcome.value_counts(sort=False) yields exactly what is shown in the book's example, result of df.birthwgt_lb.value_counts(sort=False) gives me slightly different output (see below) than it is shown in book. Other than that little issue, everything is fine - all weight values for birthwgt_lb column are summarized correctly.

EDIT Oh, and apparently I'm missing the 51 pound baby :) EDIT I just ran through the source code of nsfg.py it make sense now that 51 pound baby is removed by call to CleanFemPreg.

Result of running df.birthwgt_lb.value_counts(sort=False):

8.0     1889
7.0     3049
6.0     2223
4.0      229
5.0      697
10.0     132
12.0      10
14.0       3
3.0       98
1.0       40
2.0       53
0.0        8
9.0      623
11.0      26
13.0       3
15.0       1
Name: birthwgt_lb, dtype: int64

Can someone please let me know what might be causing this?

I'm running my code using Python(3.5.2) in virtualenv(15.1.0) with installed dependencies:

cycler (0.10.0)
matplotlib (2.0.2)
numpy (1.13.0)
pandas (0.20.2)
patsy (0.4.1)
pip (9.0.1)
pyparsing (2.2.0)
python-dateutil (2.6.0)
pytz (2017.2)
scipy (0.19.1)
setuptools (18.0.1)
six (1.10.0)
statsmodels (0.8.0)
wheel (0.24.0)