covid19datahub / COVID19

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution
https://covid19datahub.io
GNU General Public License v3.0
251 stars 93 forks source link

Data are not importing as expected #174

Closed kenshermock closed 2 years ago

kenshermock commented 2 years ago

Were there changes made to the dataset? The variable "state2" looks to be corrupt.

eguidotti commented 2 years ago

Hi @kenshermock and thanks for your message! Yes I am working on the next release that I have just announced in the README, but the changes should not be live yet.

It actually seems to work as usual, except for key_numeric that seems to be disappeared at the moment. As we have no state2 in the dataset, maybe you are using key_numeric (FIPS for US) to build that column? If so, please switch to key_local. Both key_numeric and key_local will be available in a couple of hours in the next build (so your issue should be automatically fixed). But only key_local will be maintained.

kenshermock commented 2 years ago

Thank you for your reply, Emanuele. You are correct that I was using key_alpha_2 to build state2. However, I’m still seeing lots of anomalies.

I the level 3 data set, once I limit the sample to U.S. counties, there are lots of missing values for variables that would identify the county (especially administrative_level_3). Below is Stata output for select variables. I do not see any variable that identifies the county. I had been relying on “administrative_level_3”


key_local (unlabeled)

              type:  string (str9), but longest is str2

     unique values:  51                       missing "":  441,234/472,268

          examples:  ""
                     ""
                     ""
                     ""

. codebook administrative_area_level_3


administrative_area_level_3 (unlabeled)

              type:  string (str39), but longest is str0

     unique values:  0                        missing "":  472,268/472,268

        tabulation:  Freq.  Value
                   472,268  ""

. codebook administrative_area_level_2


administrative_area_level_2 (unlabeled)

              type:  string (str61), but longest is str20

     unique values:  51                       missing "":  441,234/472,268

          examples:  ""
                     ""
                     ""
                     ""

           warning:  variable has embedded blanks

In the level 2 data set, I had been using the variable “key_alpha_2” to create my state2 variable. I believe it was a string variable containing two letters that referred to the abbreviation for the state name. key_alpha_2 now contains only missing values. “key_numeric” and “key_local” are present and contain no missing values:

key_alpha_2 (unlabeled)

              type:  numeric (byte)

             range:  [.,.]                        units:  .
     unique values:  0                        missing .:  31,085/31,085

        tabulation:  Freq.  Value
                    31,085  .

key_numeric (unlabeled)

              type:  numeric (byte)

             range:  [1,56]                       units:  1
     unique values:  51                       missing .:  0/31,085

              mean:   28.9075
          std. dev:    15.689

       percentiles:        10%       25%       50%       75%       90%
                             8        16        29        42        50

key_local (unlabeled)

              type:  string (str9), but longest is str2

     unique values:  51                       missing "":  0/31,085

          examples:  "13"
                     "24"
                     "34"
                     "45"

Thank you, Ken

From: Emanuele Guidotti @.> Reply-To: covid19datahub/COVID19 @.> Date: Thursday, October 14, 2021 at 2:39 AM To: covid19datahub/COVID19 @.> Cc: @." @.>, Mention @.> Subject: Re: [covid19datahub/COVID19] Data are not importing as expected (#174)

  External Email - Use Caution

Hi @kenshermockhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkenshermock&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912737488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=evxzwLDqfOv%2B%2ByAPyhq0a3lLhiL8NX94QJte0XotrNs%3D&reserved=0 and thanks for your message! Yes I am working on the next release that I have just announced in the READMEhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcovid19datahub%2FCOVID19&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912747483%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zfRjNxGhTEmc7AYXJRuPB3XNSezbJISt4ivPNc7Pux4%3D&reserved=0, but the changes should not be live yet.

It actually seems to work as usual, except for key_numeric that seems to be disappeared at the moment. As we have no state2 in the dataset, maybe you are using key_numeric (FIPS for US) to build that column? If so, please switch to key_local. Both key_numeric and key_local will be available in a couple of hours in the next build (so your issue should be automatically fixed). But only key_local will be maintained.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcovid19datahub%2FCOVID19%2Fissues%2F174%23issuecomment-943032529&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912757476%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SE%2BUDk9hAhxPph7DffbkQ6yNAZVVE7Yn2sNNv91tlIc%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4MMYDHEQ4AP4A3TMM34GTUGZ3LJANCNFSM5F6TN4MA&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912757476%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IIZgb0k5QSVY5u8NiE5jBS9OVMC%2BmgAL3wRqLOV4OYE%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912767471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pDPxN4XX5%2BgNxsL40LlRvUGnTAx9LrpSbEhmMFbWENI%3D&reserved=0 or Androidhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912767471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=35ZDang4RTNprWyh4EyxXvBf%2F7tzphlPRCamF3oJKeg%3D&reserved=0.

eguidotti commented 2 years ago

Thank you very much for posting this! I fixed the bug and the data should be back in a couple of hours. The other keys will be deprecated, but key_local should contain no missing value for US. Please let me know if this solves your issue. Many thanks!

kenshermock commented 2 years ago

Thank you so much for your quick responsiveness. I’m taking things one step at a time here. Level 2 data appear to be mostly good. The one issue I have found is that U.S. territories, like Puerto Rico, do not have the expected value for “key_local”. The fips for Puerto Rico is 72. However, there are a range of values (e.g., 001, 002…151, 153) in “key_local” when “administrative_area_level_1”= “Puerto Rico”. I’m concerned that this issue may have other unintended effects because “key_local” also takes on a (correct) value of “01” when “administrative_area_level_2”= “Alabama”

I’ll continue to work and let you know if I find anything else.

Thanks, again. Ken

From: Emanuele Guidotti @.> Reply-To: covid19datahub/COVID19 @.> Date: Friday, October 15, 2021 at 1:04 PM To: covid19datahub/COVID19 @.> Cc: @." @.>, Mention @.> Subject: Re: [covid19datahub/COVID19] Data are not importing as expected (#174)

  External Email - Use Caution

Thank you very much for posting this! I fixed the bug and the data should be back in a couple of hours. The other keys will be deprecated, but key_local should contain no missing value for US. Please let me know if this solves your issue. Many thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcovid19datahub%2FCOVID19%2Fissues%2F174%23issuecomment-944456892&data=04%7C01%7Cken%40jhmi.edu%7C1192ef42806646049fce08d98ffdcb99%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699142471166931%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZoIJDE2v%2F9rBwEyrtElI8PDgQ46Lryd%2BlCfm3DvHnP4%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4MMYEBOXOKU7M6K7VJBATUHBNH7ANCNFSM5F6TN4MA&data=04%7C01%7Cken%40jhmi.edu%7C1192ef42806646049fce08d98ffdcb99%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699142471166931%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xjS7QZqtJqGkn6Z3XnQunYJdMoagvwpAUV%2BTWERJjoI%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cken%40jhmi.edu%7C1192ef42806646049fce08d98ffdcb99%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699142471176930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vLkIJ9gt0Q9VjSYd2v0dUVYeZ%2BXqbmnCdLngSAiSfJw%3D&reserved=0 or Androidhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cken%40jhmi.edu%7C1192ef42806646049fce08d98ffdcb99%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699142471176930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F%2F1Ulq8m7RCKJ8jZ1Daarsr2NL4VJMI%2BESY0D%2BdVbII%3D&reserved=0.

kenshermock commented 2 years ago

Hi Emanuele,

As I worked with the level 2 dataset, the issue with U.S. territories that I described earlier was the only issue I encountered. As an example, I expected Puerto Rico to have a key_numeric value = 72 in that dataset.

The issues I described with the level 3 dataset remain. Lots of missing values where I would expect to find information. Latest Stata output using codebook command:

. codebook administrative_area_level_3 key_numeric key_local


administrative_area_level_3 (unlabeled)

              type:  string (str38), but longest is str0

     unique values:  0                        missing "":  477,341/477,341

        tabulation:  Freq.  Value
                   477,341  ""

key_numeric (unlabeled)

              type:  numeric (byte)

             range:  [1,56]                       units:  1
     unique values:  51                       missing .:  446,307/477,341

              mean:   28.9074
          std. dev:    15.689

       percentiles:        10%       25%       50%       75%       90%
                             8        16        29        42        50

key_local (unlabeled)

              type:  string (str9), but longest is str2

     unique values:  51                       missing "":  446,307/477,341

          examples:  ""
                     ""
                     ""
                     ""

Thank you and please let me know if I can assist you in any way.

Ken

From: Emanuele Guidotti @.> Reply-To: covid19datahub/COVID19 @.> Date: Thursday, October 14, 2021 at 2:39 AM To: covid19datahub/COVID19 @.> Cc: @." @.>, Mention @.> Subject: Re: [covid19datahub/COVID19] Data are not importing as expected (#174)

  External Email - Use Caution

Hi @kenshermockhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkenshermock&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912737488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=evxzwLDqfOv%2B%2ByAPyhq0a3lLhiL8NX94QJte0XotrNs%3D&reserved=0 and thanks for your message! Yes I am working on the next release that I have just announced in the READMEhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcovid19datahub%2FCOVID19&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912747483%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zfRjNxGhTEmc7AYXJRuPB3XNSezbJISt4ivPNc7Pux4%3D&reserved=0, but the changes should not be live yet.

It actually seems to work as usual, except for key_numeric that seems to be disappeared at the moment. As we have no state2 in the dataset, maybe you are using key_numeric (FIPS for US) to build that column? If so, please switch to key_local. Both key_numeric and key_local will be available in a couple of hours in the next build (so your issue should be automatically fixed). But only key_local will be maintained.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcovid19datahub%2FCOVID19%2Fissues%2F174%23issuecomment-943032529&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912757476%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SE%2BUDk9hAhxPph7DffbkQ6yNAZVVE7Yn2sNNv91tlIc%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4MMYDHEQ4AP4A3TMM34GTUGZ3LJANCNFSM5F6TN4MA&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912757476%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IIZgb0k5QSVY5u8NiE5jBS9OVMC%2BmgAL3wRqLOV4OYE%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912767471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pDPxN4XX5%2BgNxsL40LlRvUGnTAx9LrpSbEhmMFbWENI%3D&reserved=0 or Androidhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cken%40jhmi.edu%7Cd2b78cf3115147dbb7d408d98edd6bd7%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637697903912767471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=35ZDang4RTNprWyh4EyxXvBf%2F7tzphlPRCamF3oJKeg%3D&reserved=0.

eguidotti commented 2 years ago

Hi Ken, thank you very much for your help in improving the quality of the dataset. The key for Puerto Rico should be fixed now (please allow a couple of hours for the workflow to complete). Also level 3 data seem to be back to normal. It would be great if you could confirm that everything works for you.

kenshermock commented 2 years ago

Hi Emanuele,

You are very welcome. The datasets are performing better, but I do have a few comments.

Previously, in the administrative level 2 dataset, U.S. territories (e.g., “Puerto Rico”, “Virgin Islands, U.S.”) were treated like states. That is: iso_alpha_3 was set = “USA”, administrative_area_level_1 was = “United States”, the values in administrative_area_level_2 were = the values that are now in administrative_area_level_1 (e.g., “Puerto Rico”). I believe this is the configuration that most researchers in the U.S. would expect. The values of these variables have changed with the latest update.

Along the same lines, U.S. territories appear to not be represented in the level 3 dataset. I would expect to see the values that are currently in the administrative_area_level_2 field of the level 2 dataset to be in the administrative_area_level_3 field of the level 3 dataset.

I hope I have been clear. I imagine this might be a bit difficult to follow.

Please let me know if you want me to clarify anything or if there is any other way for me to help you.

Best, Ken

From: Emanuele Guidotti @.> Reply-To: covid19datahub/COVID19 @.> Date: Friday, October 15, 2021 at 7:14 PM To: covid19datahub/COVID19 @.> Cc: @." @.>, Mention @.> Subject: Re: [covid19datahub/COVID19] Data are not importing as expected (#174)

  External Email - Use Caution

Hi Ken, thank you very much for your help in improving the quality of the dataset. The key for Puerto Rico should be fixed now (please allow a couple of hours for the workflow to complete). Also level 3 data seem to be back to normal. It would be great if you could confirm that everything works for you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcovid19datahub%2FCOVID19%2Fissues%2F174%23issuecomment-944800382&data=04%7C01%7Cken%40jhmi.edu%7C26653adba70744cb4b7b08d990318074%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699364547233436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=B1eC7MQI5CHx9aapl32OUtpvn%2F7nj9SoeEmyLi3w8Aw%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4MMYG57YMTK7ZQLUQ5WWDUHCYUJANCNFSM5F6TN4MA&data=04%7C01%7Cken%40jhmi.edu%7C26653adba70744cb4b7b08d990318074%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699364547243433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FluqfPuG%2FaxPX1iLUh0CbadK5E7HE5T1U9Dq9Jlyu80%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cken%40jhmi.edu%7C26653adba70744cb4b7b08d990318074%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699364547253424%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pt48jOm%2Bj0Sr1aVSIItm5trcYU23NEVrRuu1SRGSHVw%3D&reserved=0 or Androidhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cken%40jhmi.edu%7C26653adba70744cb4b7b08d990318074%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699364547253424%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=t27YnagoW2Rsl%2BLMTLSYQeISEshROVrknf0%2FnNvd75s%3D&reserved=0.

eguidotti commented 2 years ago

Your description is very clear, thank you.

This was actually intentional, but it may be that I went for the wrong option. Let me explain below.

I found out that US terrritories were treated both as level 1 countries and US subdivisions in the level 2 (and 3) datasets. So basically they were duplicated in the dataset. I see 2 options: (a) drop the territories from the level 1 dataset or (b) drop them from US subdivisions.

On one side, I see that both NY Times and JHU CSSE are treating them as US subdivisions. On the other hand, the ISO standard treats them at the country level. See e.g. here for Puerto Rico. More importantly, I aim at standardizing the dataset worlwide and making it compatible with geospatial databases. GADM treats US territories at the same level of US. Therefore, I think the best choice would be to use the ISO standard and to make the dataset using the same administrative subdivisions as the GADM database for geospatial analysis. That is: Puerto Rico treated at the same level of US. In the same way, overseas region for France are now treated only as level 1 countries, rather than France subdivisions.

What do you think?

kenshermock commented 2 years ago

Hi Emanuele,

Your argument makes great sense, as it seems the priority is to standardize the dataset according to international standards.

I would recommend to make explicit mention in a README regarding how you are now handling the U.S. territories (i.e., at the same level as the U.S. and NOT at the level of a U.S. state). And also that you are treating regions and municipalities in U.S. territories as level 2 entities (i.e., at the same level of a U.S. state). I think this will be surprising and maybe even counterintuitive to researchers in the U.S., but as long as they have this information clearly stated, they (we) can adjust our code accordingly.

As of now, I do not find any information at all about U.S. territories in the level 3 dataset. I’m not sure if that is your intent (although if I follow your logic all the way through, perhaps it is the intent). I wasn’t sure when you mentioned the “(and 3)” in your previous email whether that was referring to the old datasets or the updated ones. Wanted to be very clear that no information about U.S. territories exists in the current level 3 dataset.

I hope I’ve been of help. Happy to keep this discussion going if I can assist in any way.

Ken

From: Emanuele Guidotti @.> Reply-To: covid19datahub/COVID19 @.> Date: Saturday, October 16, 2021 at 11:22 AM To: covid19datahub/COVID19 @.> Cc: @." @.>, Mention @.> Subject: Re: [covid19datahub/COVID19] Data are not importing as expected (#174)

  External Email - Use Caution

Your description is very clear, thank you.

This was actually intentional, but it may be that I went for the wrong option. Let me explain below.

I found out that US terrritories were treated both as level 1 countries and US subdivisions in the level 2 (and 3) datasets. So basically they were duplicated in the dataset. I see 2 options: (a) drop the territories from the level 1 dataset or (b) drop them from US subdivisions.

On one side, I see that both NY Times and JHU CSSE are treating them as US subdivisions. On the other hand, the ISO standard treats them at the country level. See e.g. herehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.iso.org%2Fobp%2Fui%2F%23iso%3Acode%3A3166%3APR&data=04%7C01%7Cken%40jhmi.edu%7C5db17cc3b3eb446c749c08d990b8c194%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699945466245680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=45cNqqKyslfi26TbqhJyDv1d72awjT%2FePrLEdTHLiGQ%3D&reserved=0 for Puerto Rico. More importantly, I aim at standardizing the dataset worlwide and making it compatible with geospatial databases. GADMhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgadm.org%2Fdownload_country.html&data=04%7C01%7Cken%40jhmi.edu%7C5db17cc3b3eb446c749c08d990b8c194%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699945466255672%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=91%2FI6vytxRCdE75vx%2BY2nrw5O9gGuNrSYlkuhn1vxYo%3D&reserved=0 treats US territories at the same level of US. Therefore, I think the best choice would be to use the ISO standard and to make the dataset using the same administrative subdivisions as the GADM database for geospatial analysis. That is: Puerto Rico treated at the same level of US. In the same way, overseas region for France are now treated only as level 1 countries, rather than France subdivisions.

What do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcovid19datahub%2FCOVID19%2Fissues%2F174%23issuecomment-944932350&data=04%7C01%7Cken%40jhmi.edu%7C5db17cc3b3eb446c749c08d990b8c194%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699945466255672%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=dxMBINo0OA%2F%2FJ6YdMnam%2BnHepDFySZNisHGFalMltJ8%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4MMYHWPGVHHOON3PX54WDUHGKC5ANCNFSM5F6TN4MA&data=04%7C01%7Cken%40jhmi.edu%7C5db17cc3b3eb446c749c08d990b8c194%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699945466265664%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xpvaw%2F9huEWRw8xRVUlYqz%2F6BbxATEzggYxUwbkxQzM%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cken%40jhmi.edu%7C5db17cc3b3eb446c749c08d990b8c194%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699945466275658%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2f5a6ngwSj8osh3s3cyUcmrOn%2BJFyuRpIp49BTelBNc%3D&reserved=0 or Androidhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cken%40jhmi.edu%7C5db17cc3b3eb446c749c08d990b8c194%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637699945466275658%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2tjcW8iDjNJBCqZawsyM2rdjivnOvyoFwMG38fWhX98%3D&reserved=0.

eguidotti commented 2 years ago

Hi Ken, I'm very sorry for the confusion. This update shouldn't be live yet but it seems something went wrong. I apologize for that.

Please let me re-consider this issue, as I'd like to keep the dataset intuitive and would avoid breaking changes for backward compatibility.

I thought the best way to solve the duplicates (e.g. Puerto Rico treated both at the level of US and of a US State) would be to treat them only at level 1. But probably keeping only level 2 would cause less inconveniences. Your feedback is very useful!

Let me dig deeper for a few European countries with a similar issue. I'd like to arrive at a final standard I can adopt and describe clearly in the README.

I will update this issue later today. Thanks for your help and your patience in this transition phase!

eguidotti commented 2 years ago

To arrive at a stable solution, I think the following 2 points should be satisfied:

This means, e.g.:

In this way, the level 1 dataset would be compatible with international standards. The level 2 and 3 datasets would be compatible with the same partition of the country established by the data provider. This avoids any breaking changes and it is easy to maintain and document. The fact that Puerto Rico (or similar) may appear twice in the datasets needs to be stated in the documentation but should not be a problem. Indeed, the data provider at level 1 (territory at the country level) may differ from the data provider at level 2 (territory under another country).

So the data for Puerto Rico and US territories should be back under US! And also in the level 1 dataset. It would great if you would like to share your thoughts on that and let me know if the import is now working as expected.

eguidotti commented 2 years ago

This seems fixed! And the new version is available. Please see the changelog