BuzzFeedNews / zika-data

Data — and pointers to data — related to the 2015–16 Zika virus outbreak.
MIT License
111 stars 79 forks source link

Colombia municipal data count mismatch #6

Open jsvine opened 8 years ago

jsvine commented 8 years ago

Data parsed from Colombia's latest municipality-level PDF provides these totals:

But the sum of these columns from the data I've parsed provides these totals:

Am I missing one/several municipalities? Are the official totals wrong? Do they include unlisted municipalities? I've scoured the two files, but can't find the source of the discrepancy. Any ideas?

chendaniely commented 8 years ago

@bryanleroylewis you mentioned [1] that we (@ndssl/zika) were tapped for Colombia data. Any thoughts about this data quality issue?

The frequency counts seem small enough (relatively) that we can start cleaning it up for the CDC now, but we should still probably address this issue eventually

[1] https://github.com/ndssl/zika_data_to_cdc/pull/1