AllenDowney / ThinkStats2

Text and supporting code for Think Stats, 2nd Edition
http://allendowney.github.io/ThinkStats2/
GNU General Public License v3.0
4.03k stars 11.31k forks source link

Chapter 3 exercises #2 solution issue #104

Closed burmecia closed 2 years ago

burmecia commented 6 years ago

In the Chapter 3 exercises No. 3, the solution is as below:

hist = thinkstats2.Hist()

for caseid, indices in preg_map.items():
    if len(indices) >= 2:
        pair = preg.loc[indices[0:2]].prglngth
        diff = np.diff(pair)[0]
        hist[diff] += 1

This solution seems only compared the 1st baby and the 2nd baby, would it be better to compare 1st baby with all the other babies for an individual woman? My suggestion is as below:

hist = thinkstats2.Hist()

for case_id, indices in preg_map.items():
    if len(indices) >= 2:
        for i in range(1, len(indices)):
            pair = preg.loc[[indices[0], indices[i]]].prglngth
            diff = np.diff(pair)[0]
            hist[diff] += 1

Comments are welcomed. Thanks.

AllenDowney commented 6 years ago

Your version uses more data, but it has some non-independence that complicates things a bit.

On Wed, Mar 28, 2018 at 6:30 PM, Bo Lu notifications@github.com wrote:

In the Chapter 3 exercises No. 2, the solution is as below:

hist = thinkstats2.Hist() for caseid, indices in preg_map.items(): if len(indices) >= 2: pair = preg.loc[indices[0:2]].prglngth diff = np.diff(pair)[0] hist[diff] += 1

This solution seems only compared the 1st baby and the 2nd baby, would it be better to compare 1st baby with all the other babies for an individual woman? My suggestion is as below:

hist = thinkstats2.Hist() for case_id, indices in preg_map.items(): if len(indices) >= 2: for i in range(1, len(indices)): pair = preg.loc[[indices[0], indices[i]]].prglngth diff = np.diff(pair)[0] hist[diff] += 1

Comments are welcomed. Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AllenDowney/ThinkStats2/issues/104, or mute the thread https://github.com/notifications/unsubscribe-auth/ABy37eO0vLUT6bsJ7dteJVOCdCTabkUkks5tjA8PgaJpZM4S_d5d .

burmecia commented 6 years ago

Agree. I think my suggestion might be better to answer the question "Are first babies more likely to be late for an individual woman?".

stellalgh commented 3 years ago

hist[diff] += 1

Could someone tell what is the meaning of this line of code? Thanks.

burmecia commented 3 years ago

hist[diff] += 1

Could someone tell what is the meaning of this line of code? Thanks.

hist is a counter, if keeps number records for each pregnancy length difference. For example, hist[0] saves the number of cases for zero difference between 1st baby and 2nd baby's pregnancy length, that is, 1st baby and 2nd baby pregnancy length is same. So, dist[diff] += 1 is just to increase that counter when it sees that particular case.