Closed winwiz1 closed 4 years ago
Thanks for reporting this! You are mentioning multiple issues here, let me try to answer them one at a time.
I assume a Level 2 or Level 3 entry in
index.csv
refers to asubregion2_name
orlocality_name
respectively - with both being located within a state/province L1 area denoted bysubregion1_name
. This led me to presume the index literal for L2 and L3 entries should be in a form L0_L1_L2/3.
This is almost right. The only thing that is not true is that L3 is a "special cases" level. Most of the time it refers to a city, which is likely to be located within an L1 or L2 region, but that may not always be the case.
I tried to verify this assumption but in README the the links schema documentation and data loading tutorial under the Notes about the data heading are broken. Is the assumption correct?
We will fix the links. The link to the schema should point to this: https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/docs/table-index.md.
The folowing L2 and L3 indexes don't comply with the L0_L1_L2/3 format:
The Libya subregions are indeed a bug, we will fix those. Unfortunately, UA_KBP
appears to be a special case since it's a city but it's reported as admin level 1: https://en.wikipedia.org/wiki/ISO_3166-2:UA.
The pull request I'm about to submit will close this issue, but feel free to reopen if you have any more questions.
Thank you for the clarification and for the fixes.
Unfortunately, UA_KBP appears to be a special case since it's a city but it's reported as admin level 1: https://en.wikipedia.org/wiki/ISO_3166-2:UA.
The ISO standard you referred to states the city is at the level 1, as mentioned above. This is correctly reflected in index.csv
by the index UA_30
having aggregation_level=1
. The index UA_KBP
seems to refer to the same city with the same wikidata=Q1899
placing the city additionally at the level 3 and leading to the entries with duplicating case counts in epidemiology.csv
:
2020-09-09,UA_30,310,4,84,,15821,250,4910,
2020-09-09,UA_KBP,310,4,84,,15821,250,4910,
That's correct, UA_30
is equivalent to UA_KBP
. We're trying to indicate that UA_KBP
is a city, whereas UA_30
is an admin level 1 region.
They are both the same, but if we omitted UA_KBP
it would be hard to find for someone who is looking for cities. Whereas you can currently search for aggregation_level=3
and find cities from all around the world regardless of whether they are descendants of levels 0, 1 or 2.
Hi,
I assume a Level 2 or Level 3 entry in
index.csv
refers to asubregion2_name
orlocality_name
respectively - with both being located within a state/province L1 area denoted bysubregion1_name
. This led me to presume the index literal for L2 and L3 entries should be in a form L0_L1_L2/3.I tried to verify this assumption but in README the the links
schema documentation
anddata loading tutorial
under the Notes about the data heading are broken. Is the assumption correct?The folowing L2 and L3 indexes don't comply with the L0_L1_L2/3 format: