avehtari / ROS-Examples

Regression and other stories R examples
https://avehtari.github.io/ROS-Examples/
325 stars 256 forks source link

Question 10.5 - incorrect data? #78

Closed maw501 closed 3 years ago

maw501 commented 3 years ago

Hi,

Loving the book - thank you! I have a question about some confusion regarding the data for one of the questions.

Question 10.5 states:

You have access to children's test scores at age 3, mother's education, and the mother's age at the time she gave birth for a sample of 400 children.

Issue 1

This is minor, but helpful background.

The kidiq.csv contains 434 samples and columns: kid_score, mom_hs, mom_iq, mom_work, mom_age. You can view this here. As such the number of samples are incorrect which impacts part (d) of the question.

Issue 2

Part (a) asks us to:

Fit a regression of child test scores on mother’s age

Part (b) asks us to:

Repeat this for a regression that further includes mother’s education

Part (c) asks us to:

Now create an indicator variable reflecting whether the mother has completed high school or not.

However there is no feature that contains mother's education (separate from mom_iq).

What's gone on?

I originally came across this question from its almost identical counterpart in Gelman and Hill, Q3.4. The data for that actually contains another file, child.iq.dta (as well as kid.iq.dta which is the same as the RoS data). This child-iq data has 400 samples and columns: ppvt, educ_cat and momage.

In other words, it sounds like the question is referring to this child.iq.dta dataset and not the kid.iq.dta dataset as it (i) has the right number of columns (3), (ii) has the right number of samples (400), (iii) contains a variable for education level and (iv) doesn't have a feature for whether the mother completed high-school and asks for this to be created (presumably this can be done from the educ_cat feature with knowledge of the levels).

Also, I can't see how to link the two datasets.

I'm thus wondering if, whilst writing the question, there has been some confusion between these two datasets?

I hope this makes sense, it's a bit convoluted to explain. Any clarification appreciated!

avehtari commented 3 years ago

Sorry for the delay. Yes the question is referring to the child.iq.dta, which we'll add to the repo. We are still checking if there are other inconsistencies related to these files.

avehtari commented 3 years ago

I forgot to tell that the new data is there (since 6th April)