epiforecasts / covidregionaldata

An interface to subnational and national level COVID-19 data. For all countries supported, this includes a daily time-series of cases. Wherever available we also provide data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as linelist data and links to intervention data sets.
https://epiforecasts.io/covidregionaldata/
Other
37 stars 18 forks source link

Alternative source for Switzerland #412

Closed Bisaloo closed 2 years ago

Bisaloo commented 3 years ago

Related: https://github.com/epiforecasts/covid19-forecast-hub-europe/issues/906

We have been made aware of an alternative data source for Switzerland that gives completely different results than the current one (as least for hospitalisations): https://www.covid19.admin.ch/en/overview.

I'm keen to add this data source to covidregionaldata but I'm not sure what is the best way to handle this situation since there is already one data source and I don't know if / which one is more reliable than the other.

There are several options:

RichardMN commented 3 years ago

I added Switzerland, having found what appeared to be a fairly comprehensive and reputable data source, but I've not compared it with others and I don't think I'm qualified to judge between them.

If someone says that we should switch to another then I can look at replacing the back-end - and this could include studying @kathsherratt's dark magic on the UK code. (I think I ported the UK code to the new R6 system and I still find it baffling.)

It would probably be fairly straightforward to just use a different source for a subset of our data columns. I've found some new sources for Lithuania and had pondered adding hospitalization data.

seabbs commented 3 years ago

Thanks, both, good to have a discussion on this.

So the first step might be to open an issue on the source repository we are using (https://github.com/openZH/covid_19/issues) . It certainly looks maintained and high quality so there must be solid reasons for the discrepancy. On the other hand, the source @bisaloo has found looks quite official and so seems like something we would want to source data from.

If we can establish some kind of superiority of either source (on the face of it the official source seems like the obvious choice) then switching would seem to make sense. If we can't or if we think there are still good reasons to support both then I would suggest two possible options.

If using both sources I see two options:

  1. In the current class add a new argument (i.e like nhsregions in the UK class) and then add custom control code to download and join the data as in the UK.

  2. If the data fully overlaps but is equally valid we would ideally offer a choice of source with two separate documented classes. This gets a bit tricky with how we have set things up but seems in principle doable. I think the way to do this would be to have child classes for Switzerland and then have these be initialised and returned when the Switzerland class is called with that source (so Switzerland_source_name for each child class). That seems like a slightly more general solution for what is perhaps a fairly common problem but has a less than satisfying approach to dispatch.

seabbs commented 3 years ago

Anyone make any progress on this?

RichardMN commented 3 years ago

Looking at this it looks as though we may need a two-stage download method.

The data locations are updated daily and provided in a JSON file available at https://www.covid19.admin.ch/api/data/context

This gives as follows:

      "csv": {
        "daily": {
          "cases": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Cases_geoRegion.csv",
          "casesVaccPersons": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Cases_vaccpersons.csv",
          "hosp": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Hosp_geoRegion.csv",
          "hospVaccPersons": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Hosp_vaccpersons.csv",
          "death": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Death_geoRegion.csv",
          "deathVaccPersons": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Death_vaccpersons.csv",
          "test": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Test_geoRegion_all.csv",
          "testPcrAntigen": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Test_geoRegion_PCR_Antigen.csv",
          "hospCapacity": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19HospCapacity_geoRegion.csv",
          "re": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Re_geoRegion.csv",
          "intCases": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19IntCases.csv",
          "virusVariantsWgs": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Variants_wgs.csv",
          "covidCertificates": "https://www.covid19.admin.ch/api/data/20211001-z0wrsmyu/sources/COVID19Certificates.csv"
        },

So we may need to have something which grabs the JSON, determines which CSVs it wants to download, then downloads them. I've looked at the UK code a bit and I'm currently not sure how to document two different data sources like this - what do we list as the source URL and text? Still, I can imagine how this may be done.

github-actions[bot] commented 2 years ago

This issue has been flagged as stale due to lack of activity