covid19datahub / COVID19

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution
https://covid19datahub.io
GNU General Public License v3.0
251 stars 93 forks source link

USA data - key_alpha_2 is blank #175

Closed hlcny closed 2 years ago

hlcny commented 2 years ago

Hi, The returned dataframe from a call to: covid19(country = "USA", level=2) shows all NAs for key_alpha_2, whereas it used to have valid keys for each state. Is this intentional? This change has broken my workflow, so I'd like to know if I need to re-write my code, or can wait for it to be fixed. Thx.

eguidotti commented 2 years ago

Hi @hlcny, this is intentional as key_alpha_2, key_numeric, and key will all be substituted by key_local. This column reports the local identifier used by the national institute of statistics or local equivalent. This should be easier to maintain. For US, it will be the FIPS code. Does your workflow require the alpha code instead of FIPS?

hlcny commented 2 years ago

Thanks...as written my workflow required the alpha code, but it's not difficult to switch, now that I know that the change is permanent. I just fixed my workflow by re-populating key_alpha_2 with the two digit alpha codes by using the key_numeric code already in the database, like this, and an R package that automates the concordances: usa2$key_numeric <- sprintf("%02d", usa2$key_numeric) usa2$key_alpha_2 <- fips_abbr(usa2$key_numeric)

I could do the same using key_local, now that I know that that is intended to be the "master key".

BTW: If you are going to switch entirely to FIPS, I'd recommend properly formatting them to always have the correct number of digits, with leading zeros. I know it's not difficult to manually pad the zeros, but it's an extra step and potential source of error or frustration when using the data.

eguidotti commented 2 years ago

BTW: If you are going to switch entirely to FIPS, I'd recommend properly formatting them to always have the correct number of digits, with leading zeros. I know it's not difficult to manually pad the zeros, but it's an extra step and potential source of error or frustration when using the data.

Definitely yes! key_local should be already available in the current dataset and formatted in the correct way. I understand correctly, you could use directly:

usa2$key_alpha_2 <- fips_abbr(usa2$key_local)