Open bencooper222 opened 5 years ago
I don't see the benefit of this. I think the existing implementation is fine, except we should also have an other
field. Most hackathons only use this data to roughly estimate diversity, which the current categorization is sufficient for imho.
We save like 20 lines of code (we're talking about adding a checkbox and a field) and, in exchange, we get less accuracy with our estimates. More worryingly, our estimates become completely invalid in the formal sense because we can't quantify error at all. We're not running a study or something where that would be completely disqualifying but less accuracy to avoid basically no extra code seems like a bad choice.
I don't get what is inaccurate about our current categorization. For example, the demographic stats we got from VH5 seems accurate and perfectly fine.
I can get us some hard numbers on which groups we'd have trouble capturing later. That said, I think we'd struggle to identify bad data if we continue with the approach of past years. How do you identify someone forced to misclassify?
Well the other
option would enable us to identify those that were previously misclassified.
Our numbers aren’t valuable by themselves - they’re valuable by comparison to others. That’s why I think we should stick to the same format that everyone else uses.
From the Census Bureau:
Race/ethnicity are messy concepts and I'm not arguing the Census' classification is perfect. However, us deviating from their methodology injures our ability to make comparisons to population level statistics.
The proper way to do this is to just mimic the Census, but with less fidelity (see below for what the Census does). I don't think we need to worry about a Native American's tribe, the specific Asian country of origin or the specific Hispanic origin so we can just do a boolean for Hispanic status and a "check all that apply" for race.
https://github.com/VandyHacks/vaken/blob/039f3dc77374432aa272559fb24977bf8920ffb5/src/common/schema.graphql.ts#L36