european-modelling-hubs / covid19-forecast-hub-europe_archive

European Covid-19 Forecast Hub.
https://covid19forecasthub.eu
Other
48 stars 94 forks source link

Hospitalisation data update: issue discussion #1170

Closed kathsherratt closed 1 year ago

kathsherratt commented 3 years ago

We have today updated the hospitalisation data available in the hub. We now provide a cleaner data set covering weekly data for 20 countries. We hope that this will encourage more teams to submit hospitalisation forecasts.

Key changes to hospitalisation data

We have expanded data availability across the 32 locations in the Hub:

  1. We now provide a data set covering a greater set of countries (20). We have added hospitalisation data for: Czechia, Iceland, Italy, Latvia, Liechtenstein, Malta, Cyprus, Poland, Portugal, and Switzerland. Data for the UK, which stopped updating over summer, is now up to date.

  2. We no longer include data for Spain, due to the extent of data revisions and variation between data sources. As a result, hospitalisation forecasts for Spain will not pass validation checks in the hub submission process. Please bear this in mind when submitting forecasts on Monday and in future. We will of course be accommodating this week given the very short notice.

  3. The remaining 10 of the 11 countries currently included in the “truth data” file are all still covered, with minor or no changes to past observed data. Some weeks’ absolute counts have reduced slightly in Ireland, and some countries are now subject to data truncation (see below). I attach a plot showing a comparison of new and old datasets.

We have also made some changes to the format of the observed data we provide in the hub repository:

  1. We have changed our truth dataset to contain weekly data for all countries. Some of the sources of hospitalisation data we use are only available at a weekly basis, so all data are presented weekly for consistency. Meanwhile we continue to use the MMWR week definition of a Sunday to Saturday week. For countries where we have been able to source daily data, we still provide access to this data in the repository. However this comes with a substantial warning that it is subject to revisions, and we will only use the weekly truth dataset in forecast evaluation.

  2. We are now truncating the last 1-2 weeks for a selection of countries where in the past we have found these to be subject to large revisions. This means some recent data will be removed and take longer to become available (for example, the last week of Belgian data is now 2021-10-23). We’ll leave it up to you whether this means you make shorter forecasts. Please note that the way the forecasts are named remains unchanged, i.e. a “1 week forecast” is 1 week from the day on which the forecast is made, even if data is only available up to 3 weeks ago. As an aside, the truncation is creating some issues in our visualisations which we are working to fix.

Additional background

For current forecasters of hospitalisation data, we realise this is a short notice change to data access for this Mondays’ forecasts. If necessary we can consider removing these from the evaluation dataset, and would welcome your thoughts if this will impact your forecasts this week.

In general we hope that the new dataset will make hospitalisation forecasts more accessible and reliable for all.

Please use the below for any questions about the data.

Best wishes, Kath

country-diffs

kraus-stat commented 3 years ago

@kathsherratt Hospitalization data for Switzerland look a little suspicious. About 10 to 20 admissions per week in the truth data file (about 2 per day in non-eu.csv) in recent weeks seems too few in comparison with hospital occupancy (about 400 according to Our World in Data). According to the Swiss official website there were about 300 admissions in the last 14 days. Here's the median case-to-admission ratio in the last 10 weeks, which looks rather strange for Switzerland:

> dat_weekly[,tail(.SD,10),by=country][,list(cases_to_hosp=median(cases/hosp,na.rm=T)),by=country][order(cases_to_hosp)]
    country cases_to_hosp
 1:      CZ      10.92092
 2:      LV      13.42227
 3:      HR      15.29288
 4:      LI      15.50000
 5:      IT      17.26066
 6:      FR      19.57550
 7:      CY      21.67500
 8:      EE      24.01233
 9:      SI      24.77299
10:      DK      35.13728
11:      BE      35.29188
12:      MT      36.66667
13:      IE      36.78327
14:      GB      44.26618
15:      NL      49.99392
16:      IS      57.00000
17:      NO      64.31373
18:      PT      93.32353
19:      CH     578.91738
20:      AT            NA
21:      BG            NA
22:      DE            NA
23:      ES            NA
24:      FI            NA
25:      GR            NA
26:      HU            NA
27:      LT            NA
28:      LU            NA
29:      PL            NA
30:      RO            NA
31:      SE            NA
32:      SK            NA
    country cases_to_hosp
kathsherratt commented 3 years ago

Thank you for raising this @kraus-stat - I'll have a look at the original data source (which we have a bit more flexibility with than the ECDC sources).

We have also just seen problems with data in Slovakia, which we will likely need to exclude.

I'll send a further update on both of these and we can discuss again on Wednesday.

kathsherratt commented 2 years ago

See latest update from #1250.

We re-ran our data stability check on hospitalisation data. This resulted in changing the countries we are able to include. In this check we also found we could reduce the amount of data truncation while maintaining a stable data source, so for nearly all countries, data in the the hub hospitalisation "truth data" file are now up to date.

Changes:

The complete list of countries against which we can evaluate hospitalisation forecasts, is now:

More information:

sbfnk commented 1 year ago

Superseded by new data source in #2823