epiforecasts / covidregionaldata

An interface to subnational and national level COVID-19 data. For all countries supported, this includes a daily time-series of cases. Wherever available we also provide data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as linelist data and links to intervention data sets.
https://epiforecasts.io/covidregionaldata/
Other
37 stars 18 forks source link

Add Switzerland data #365

Closed RichardMN closed 3 years ago

RichardMN commented 3 years ago

This adds level 1 (canton) level data for Switzerland.

There are some open questions.

Currently it uses all the different language versions for cantons which have different names in any of Switzerland's four languages. This is a compromise, but may not be sustainable or useful. I don't know if anyone else has a better sense or preference to express on how to resolve this?

Currently it excludes data for Liechtenstein. Data for Liechtenstein is treated as just another canton in the source (https://github.com/openZH/covid_19/ from @openZH), using "FL" as the region code. Liechtenstein isn't a canton, and Liechtenstein does have a separate set of ISO 3166-2 codes. I could add an optional param to pass Liechtenstein data through and fake something which looks like an ISO 3166-2 code for Liechtenstein, in case someone wants to do data analysis treating Liechtenstein as a contiguous/continuous part of Switzerland.

Currently it is passing through (not removing) some of the additional fields (e.g. ventilators, ICU) present in the source data but without renaming them usefully (or documenting them). I'd welcome advice on whether to keep this or not.

seabbs commented 3 years ago

This looks great.

On the language issue does this mean there are duplicates in the data? If that is the case I can imagine it becoming really quite confusing very fast... my preference would be to default to returning unique rows only so that summarised counts are unique. We could add a custom method to this class to only selection of region names by language but this seems like an optional extra?

Liechtenstein is tricky. I suppose keeping it in with a non-functional ISO code makes the most sense?

We have been passing through all useful optional fields and whilst not ideal is something we should keep doing. We could think about expanding the standard naming scheme to include more fields (like ICU usage).

Ahead of merging can you update the news and dev version number?

RichardMN commented 3 years ago

On the language issue does this mean there are duplicates in the data? If that is the case I can imagine it becoming really quite confusing very fast... my preference would be to default to returning unique rows only so that summarised counts are unique. We could add a custom method to this class to only selection of region names by language but this seems like an optional extra?

I should have been clearer. The source data uses two letter codes which uniquely identify each region and match up with the ISO 3166-2 codes. The wikipedia list of names for these regions provides multilingual options for cantons which are multilingual; these are currently added as a long single character string which lists them all. I don't know of a "safe default" to apply choosing one language over the other for each option. (Coming from another federal country with sensitive language politics, I approach the question with some trepidation.) I may to through and just choose which seems "more common in English usage" but this will be arbitrary and we may get complaints or a request to change it.

Liechtenstein is tricky. I suppose keeping it in with a non-functional ISO code makes the most sense?

I will mock up something which looks like an ISO 3166-2 code but which does not imply that Liechtenstein is part of Switzerland and does not create a collision with the actual ISO 3166-2 codes which exist for Liechtenstein.

We have been passing through all useful optional fields and whilst not ideal is something we should keep doing. We could think about expanding the standard naming scheme to include more fields (like ICU usage).

I think having standard renaming may be helpful. I'll also try to document the fields we are passing through (as was done for Lithuania).

Ahead of merging can you update the news and dev version number?

Yes.

RichardMN commented 3 years ago

I think I've now applied the changes requested.

I have just linked to the existing documentation of additional fields in the @OpenZH repo since it is already in clear English. (For Lithuania I was providing a translation of the Lithuanian-language documentation, which was also harder to find.)

RichardMN commented 3 years ago

Sincere thanks to @metaodi for looking this over and helping with the names question - and I reiterate my appreciation for the work being done putting this dataset together.

Thanks also for the referral to the https://opendata.swiss/en/dataset/covid-19-schweiz ; there is more in there and (with some digging) it might be a second step to try adding in some of the additional (or alternate) data there.

seabbs commented 3 years ago

Great job on this @RichardMN and thanks @metaodi for looking this over (and also of course gathering the data!)