Closed awunderground closed 3 months ago
It's possible that there are NA
values for low birthweight with _quality
values of 3
. Make sure the quality variable is missing if there is no value reported.
The categories for mother's education should be: less than HS, HS/GED, some college, bach+). For naming, the subgroup_type can be mothers-education and the categories can be: Less than high school GED or high school degree Some college College degree or higher Just to note too that each of these categories will also have a quality value.
Mother's Education Subgroup types for education: Use full strings
PR is #285
We discussed the data quality assignment. A lot of it depends on what the CDC assigns as unidentified counties. In their documentation, the CDC says: County-level data are shown for counties with populations of 100,000 persons or more. All counties with fewer than 100,000 persons are shown combined together under the label "Unidentified Counties" for the state. In a few states, this makes up the majority of counties (e.g., Alaska). Some states have very few counties and it might be 2 of 5 that are combined. Up until now, any counties that are flagged as "unidentified" (which are around 2,500 of 3,143 counties) were marked with a data quality flag of poor, which we say not to use. We decided to pull some out from this and into data quality of marginal, such as those cases with a smaller share of the counties in a state being combined. In cases where the unidentified counties are >0 and <=5, we automatically assign those as marginal quality (quality=2). We took a ratio of the number of unidentified counties in a state divided by the total number of counties in a state. If the ratio is <.5 and the number of counties <=15, then those should also be a data quality of 2. We should consider revisiting quality next round, but this does bring some up from being 3. We have other cases where such a large majority of the state's counties are "unidentified" that it is basically a state-wide value, and it is not a good representation of any single county. This really only tweaked the quality. We should further scrutinize this in the next round.
Closed with #285
rate_low_birth_weight