covid19datahub / COVID19

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution
https://covid19datahub.io
GNU General Public License v3.0
251 stars 93 forks source link

International/upper-national data #189

Closed I-I-IT closed 2 years ago

I-I-IT commented 2 years ago

I suggest adding new levels for areas not strictly in one country. Here is my proposed nomenclature. It is divided into political designation and geographical areas.

Level 0: World

Level 1: Super-continent (wiki) or Multi-continent very large geographical area Large Multi-Continent political aggregation

Level 2: Continent or M-C medium-to-small geographical areas. Small M-C political aggregation or continental political aggregation

Level 3: Anything under that (uni-continent Geo areas and smaller than continental political aggregations)

Examples for easier understanding: Level 1 : Eurasia; G20; South; North Level 2: Asia,Europe,..; G7; EU; Middle East (3-continental) Level 3 : British Isles, Caribbeans,

Note: Two main point of discussion How we cut Geo areas will need to be decided: do we select countries in them or do we take sub-nation areas into them ? Common definition for areas will need to be defined, perhaps by a wiki or something

For technical side seems feasible although it's probably going to be in another category


What do you think ?

eguidotti commented 2 years ago

Hi @TechFanTheo and thanks for your message. I'm not sure I understand correctly. Let me explain below.

The subdivions are dictated by the data sources (e.g., national ministry of health). They typically provide the data at 3 levels:

So we are using levels 1, 2, 3 to mimic the original providers. (For instance, there is no provider for Europe as a whole)

In the current version of the data hub, you can also find key_gadm, which is the identifier of the administrative area used in the GADM database. This should serve to map the areas in standardized administrative levels. Is this what you are looking for?

A user may agrgegate the data to produce a dataset at the continent level, or at other standardized administrative area levels via GADM. However, aggregating the data is always quite dangerous (for instance, if we miss one European country, the counts for Europe would be understated). This is why we provide counts that are directly provided by the official sources (but without aggregating or manipulating them). In this sense, we are very much constrained to use the administrative subdivisions used by the original provider.

May it be useful adding such info to the README?

I-I-IT commented 2 years ago

I see, I believe it would be useful to have such classification bur understand you don’t want to get into that. Also kinda a choice whether stay an aggregator or become more of a analyser like OWID. Yes I believe explaining it might be useful just to clarify.

(By the way I was looking to projects that use the data you are providing and a few weeks ago I was on a GitHub page that listed them but can’t find it. Do you know where it is ? I know I can download it but tbh a bit lazy to learn R, etc right now)

Le dim. 19 déc. 2021 à 01:26, Emanuele Guidotti @.***> a écrit :

Hi @TechFanTheo https://github.com/TechFanTheo and thanks for your message. I'm not sure I understand correctly. Let me explain below.

The subdivions are dictated by the data sources (e.g., national ministry of health). They typically provide the data at 3 levels:

  • national (country)
  • sub-national (regions/states/cantons)
  • lower-level (city/municipality/counties)

So we are using levels 1, 2, 3 to mimic the original providers. (For instance, there is no provider for Europe as a whole)

In the current version of the data hub, you can also find key_gadm, which is the identifier of the administrative area used in the GADM https://gadm.org/ database. This should serve to map the areas in standardized administrative levels. Is this what you are looking for?

A user may agrgegate the data to produce a dataset at the continent level, or at other standardized administrative area levels via GADM. However, aggregating the data is always quite dangerous (for instance, if we miss one European country, the counts for Europe would be understated). This is why we provide counts that are directly provided by the official sources (but without aggregating or manipulating them). In this sense, we are very much constrained to use the administrative subdivisions used by the original provider.

May it be useful adding such info to the README?

— Reply to this email directly, view it on GitHub https://github.com/covid19datahub/COVID19/issues/189#issuecomment-997305372, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZ64NPS4GPUZAPKSAHT3ODURUREPANCNFSM5KLGYULQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

eguidotti commented 2 years ago

OK, I'll update the README.

Here you can find the academic publications that use our data. But I'm now aware of the github page you are referring to. If you find it, please let me know. I'd be happy to have a look at it!

Thanks!

I-I-IT commented 2 years ago

Sorry I didn't realise that the Covid19 Data Hub and the COVID-19 Open-Data- (GitHub now sponsored by Google but previously independent (old GitHub page when it was not Googled)- were not the same. I seriously wonder what are the differences tough. The list I was talking about was not for this dataset sorry.

eguidotti commented 2 years ago

Thanks, I've updated the README.

I am aware of that repo. Frankly, I was also wondering what was the difference the first time I discovered that. It would have been nice to join forces... Now it seems to me that we are providing a few data that they don't have and vice-versa. One can consider merging them to complement to two datasets. Cheers

I-I-IT commented 2 years ago

It seems that they have an additional level but I must say that I haven't really managed to use either dataset yet. Anyway I think I can close the issue now.

On Sun, 19 Dec 2021 at 14:58, Emanuele Guidotti @.***> wrote:

Thanks, I've updated the README.

I am aware of that repo. Frankly, I was also wondering what was the difference the first time I discovered that. It would have been nice to join forces... Now it seems to me that we are providing a few data that they don't have and vice-versa. One can consider merging them to complement to two datasets. Cheers

— Reply to this email directly, view it on GitHub https://github.com/covid19datahub/COVID19/issues/189#issuecomment-997396811, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZ64NNDXX6EVQPIXPKV4CLURXQHRANCNFSM5KLGYULQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>