MEDSL / 2022-elections-official

Official returns for the 2022 Midterm Elections
16 stars 4 forks source link

Some FIPS codes are 5 digit strings #21

Open NickCrews opened 6 months ago

NickCrews commented 6 months ago

FIPS codes are supposed to be 5-digit strings, with leading 0s if relevant. A few errors I've found:

I am still very happy to write some testing/QA scripts for your exported .csvs that might catch some of these common errors, please let me know if that would be useful.

Thank you!

sbaltzmit commented 6 months ago

You're right about the 4-digit FIPS codes, thanks for that! And clearly RI got caught with a data type error. I'll add those latter 4 states to the list to check. The 10-digit FIPS codes are the official municipality or township-level FIPS codes, which are the appropriate designator of jurisdiction in states that do not administer elections at the county level. I'll look into the Hartford County situation, if that's a county FIPS code it's probably fine but if it's a jurisdiction FIPS code it should probably have a suffix.

On the topic of QA, related to conversations we've had about this in two other Issues, we have scripts that both automatically apply padding to coerce to a the FIPS code into a 5 digit zero-padded string and that also then check for every issue you've raised and raise a flag if there is a problem. But QA on a dataset like this is a very involved process with a lot of flags. As I know you well understand since you've mentioned it in the past, consistently catching every subtle data type issue in a nearly 15 million row dataset with regular updates is a matter of more than just having a QA script. I really appreciate when you raise data issues that we can address. But this is fair warning that I won't engage with further Issues that continue to imply that we don't do QA -- we spend a very long time doing very extensive QA.