Race/ethnicity enum doesn't match Census standards

VandyHacks / vaken

Next-gen hackathon registration system

MIT License

49 stars 12 forks source link

Race/ethnicity enum doesn't match Census standards #319

Open bencooper222 opened 5 years ago

bencooper222 commented 5 years ago

From the Census Bureau:

The U.S. Census Bureau considers race and ethnicity to be two separate and distinct concepts (source)

Race/ethnicity are messy concepts and I'm not arguing the Census' classification is perfect. However, us deviating from their methodology injures our ability to make comparisons to population level statistics.

The proper way to do this is to just mimic the Census, but with less fidelity (see below for what the Census does). I don't think we need to worry about a Native American's tribe, the specific Asian country of origin or the specific Hispanic origin so we can just do a boolean for Hispanic status and a "check all that apply" for race.

https://github.com/VandyHacks/vaken/blob/039f3dc77374432aa272559fb24977bf8920ffb5/src/common/schema.graphql.ts#L36

cktang88 commented 5 years ago

I don't see the benefit of this. I think the existing implementation is fine, except we should also have an other field. Most hackathons only use this data to roughly estimate diversity, which the current categorization is sufficient for imho.

bencooper222 commented 5 years ago

We save like 20 lines of code (we're talking about adding a checkbox and a field) and, in exchange, we get less accuracy with our estimates. More worryingly, our estimates become completely invalid in the formal sense because we can't quantify error at all. We're not running a study or something where that would be completely disqualifying but less accuracy to avoid basically no extra code seems like a bad choice.

cktang88 commented 5 years ago

I don't get what is inaccurate about our current categorization. For example, the demographic stats we got from VH5 seems accurate and perfectly fine.

bencooper222 commented 5 years ago

I can get us some hard numbers on which groups we'd have trouble capturing later. That said, I think we'd struggle to identify bad data if we continue with the approach of past years. How do you identify someone forced to misclassify?

cktang88 commented 5 years ago

Well the other option would enable us to identify those that were previously misclassified.

bencooper222 commented 5 years ago

Our numbers aren’t valuable by themselves - they’re valuable by comparison to others. That’s why I think we should stick to the same format that everyone else uses.