Closed maw501 closed 3 years ago
Sorry for the delay. Yes the question is referring to the child.iq.dta, which we'll add to the repo. We are still checking if there are other inconsistencies related to these files.
I forgot to tell that the new data is there (since 6th April)
Hi,
Loving the book - thank you! I have a question about some confusion regarding the data for one of the questions.
Question 10.5 states:
Issue 1
This is minor, but helpful background.
The
kidiq.csv
contains 434 samples and columns:kid_score
,mom_hs
,mom_iq
,mom_work
,mom_age
. You can view this here. As such the number of samples are incorrect which impacts part (d) of the question.Issue 2
Part (a) asks us to:
Part (b) asks us to:
Part (c) asks us to:
However there is no feature that contains mother's education (separate from
mom_iq
).What's gone on?
I originally came across this question from its almost identical counterpart in Gelman and Hill, Q3.4. The data for that actually contains another file,
child.iq.dta
(as well askid.iq.dta
which is the same as the RoS data). Thischild-iq
data has 400 samples and columns:ppvt
,educ_cat
andmomage
.In other words, it sounds like the question is referring to this
child.iq.dta
dataset and not thekid.iq.dta
dataset as it (i) has the right number of columns (3), (ii) has the right number of samples (400), (iii) contains a variable for education level and (iv) doesn't have a feature for whether the mother completed high-school and asks for this to be created (presumably this can be done from theeduc_cat
feature with knowledge of the levels).Also, I can't see how to link the two datasets.
I'm thus wondering if, whilst writing the question, there has been some confusion between these two datasets?
I hope this makes sense, it's a bit convoluted to explain. Any clarification appreciated!