Closed RichardMN closed 3 years ago
This looks great.
On the language issue does this mean there are duplicates in the data? If that is the case I can imagine it becoming really quite confusing very fast... my preference would be to default to returning unique rows only so that summarised counts are unique. We could add a custom method to this class to only selection of region names by language but this seems like an optional extra?
Liechtenstein is tricky. I suppose keeping it in with a non-functional ISO code makes the most sense?
We have been passing through all useful optional fields and whilst not ideal is something we should keep doing. We could think about expanding the standard naming scheme to include more fields (like ICU usage).
Ahead of merging can you update the news and dev version number?
On the language issue does this mean there are duplicates in the data? If that is the case I can imagine it becoming really quite confusing very fast... my preference would be to default to returning unique rows only so that summarised counts are unique. We could add a custom method to this class to only selection of region names by language but this seems like an optional extra?
I should have been clearer. The source data uses two letter codes which uniquely identify each region and match up with the ISO 3166-2 codes. The wikipedia list of names for these regions provides multilingual options for cantons which are multilingual; these are currently added as a long single character string which lists them all. I don't know of a "safe default" to apply choosing one language over the other for each option. (Coming from another federal country with sensitive language politics, I approach the question with some trepidation.) I may to through and just choose which seems "more common in English usage" but this will be arbitrary and we may get complaints or a request to change it.
Liechtenstein is tricky. I suppose keeping it in with a non-functional ISO code makes the most sense?
I will mock up something which looks like an ISO 3166-2 code but which does not imply that Liechtenstein is part of Switzerland and does not create a collision with the actual ISO 3166-2 codes which exist for Liechtenstein.
We have been passing through all useful optional fields and whilst not ideal is something we should keep doing. We could think about expanding the standard naming scheme to include more fields (like ICU usage).
I think having standard renaming may be helpful. I'll also try to document the fields we are passing through (as was done for Lithuania).
Ahead of merging can you update the news and dev version number?
Yes.
I think I've now applied the changes requested.
I have just linked to the existing documentation of additional fields in the @OpenZH repo since it is already in clear English. (For Lithuania I was providing a translation of the Lithuanian-language documentation, which was also harder to find.)
Sincere thanks to @metaodi for looking this over and helping with the names question - and I reiterate my appreciation for the work being done putting this dataset together.
Thanks also for the referral to the https://opendata.swiss/en/dataset/covid-19-schweiz ; there is more in there and (with some digging) it might be a second step to try adding in some of the additional (or alternate) data there.
Great job on this @RichardMN and thanks @metaodi for looking this over (and also of course gathering the data!)
This adds level 1 (canton) level data for Switzerland.
There are some open questions.
Currently it uses all the different language versions for cantons which have different names in any of Switzerland's four languages. This is a compromise, but may not be sustainable or useful. I don't know if anyone else has a better sense or preference to express on how to resolve this?
Currently it excludes data for Liechtenstein. Data for Liechtenstein is treated as just another canton in the source (https://github.com/openZH/covid_19/ from @openZH), using "FL" as the region code. Liechtenstein isn't a canton, and Liechtenstein does have a separate set of ISO 3166-2 codes. I could add an optional param to pass Liechtenstein data through and fake something which looks like an ISO 3166-2 code for Liechtenstein, in case someone wants to do data analysis treating Liechtenstein as a contiguous/continuous part of Switzerland.
Currently it is passing through (not removing) some of the additional fields (e.g. ventilators, ICU) present in the source data but without renaming them usefully (or documenting them). I'd welcome advice on whether to keep this or not.