UI-Research / mobility-from-poverty

4 stars 1 forks source link

6. Update share of low-weight births #205

Closed awunderground closed 3 months ago

awunderground commented 9 months ago
awunderground commented 4 months ago

It's possible that there are NA values for low birthweight with _quality values of 3. Make sure the quality variable is missing if there is no value reported.

cdsolari commented 4 months ago

The categories for mother's education should be: less than HS, HS/GED, some college, bach+). For naming, the subgroup_type can be mothers-education and the categories can be: Less than high school GED or high school degree Some college College degree or higher Just to note too that each of these categories will also have a quality value.

JCarterUI commented 4 months ago

Mother's Education Subgroup types for education: Use full strings

awunderground commented 4 months ago

PR is #285

cdsolari commented 3 months ago

We discussed the data quality assignment. A lot of it depends on what the CDC assigns as unidentified counties. In their documentation, the CDC says: County-level data are shown for counties with populations of 100,000 persons or more. All counties with fewer than 100,000 persons are shown combined together under the label "Unidentified Counties" for the state. In a few states, this makes up the majority of counties (e.g., Alaska). Some states have very few counties and it might be 2 of 5 that are combined. Up until now, any counties that are flagged as "unidentified" (which are around 2,500 of 3,143 counties) were marked with a data quality flag of poor, which we say not to use. We decided to pull some out from this and into data quality of marginal, such as those cases with a smaller share of the counties in a state being combined. In cases where the unidentified counties are >0 and <=5, we automatically assign those as marginal quality (quality=2). We took a ratio of the number of unidentified counties in a state divided by the total number of counties in a state. If the ratio is <.5 and the number of counties <=15, then those should also be a data quality of 2. We should consider revisiting quality next round, but this does bring some up from being 3. We have other cases where such a large majority of the state's counties are "unidentified" that it is basically a state-wide value, and it is not a good representation of any single county. This really only tweaked the quality. We should further scrutinize this in the next round.

awunderground commented 3 months ago

Closed with #285