covid19datahub / COVID19

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution
https://covid19datahub.io
GNU General Public License v3.0
251 stars 93 forks source link

In Switzerland: cumulative tests is not complete #204

Closed Marina-Antillon closed 2 years ago

Marina-Antillon commented 2 years ago

In Switzerland, the cumulative number of tests do not begin until 22-5-2020, but I know there are data on tests since the beginning of transmission:

https://opendata.swiss/en/dataset/covid-19-schweiz Look for: Covid19Test_geoRegion_PCR_Antigen.csv

Edit: I added it to the thread on adding new data sources.

eguidotti commented 2 years ago

Thanks! I just had a look at https://github.com/covid19datahub/COVID19/issues/179

It seems that the source suggested for Switzerland also starts after 22-5-2020. (there are some data before but they are NAs)

Please find below the code I used to check this (Switzerland level 1; similar results also for cantons)

library(dplyr)
url <- "https://www.covid19.admin.ch/api/data/20220325-gjomvciq/sources/COVID19Test_geoRegion_PCR_Antigen.csv"
data <- read.csv(url)
swiss <- data %>%
  filter(geoRegion == "CH") %>%
  group_by(datum) %>%
  summarise(tests = sum(sumTotal)) %>%
  arrange(datum) %>%
  na.omit()

min(swiss$datum) # -> gives "2020-05-23"
Marina-Antillon commented 2 years ago

I am very sorry! I actually use the data downloaded from here: https://www.covid19.admin.ch/en/epidemiologic/test?time=total&epiRelDev=abs

And I thought that data in the opendata.swiss was supposed to mirror it! I suppose not.

If you scroll down at the bottom, to "data as .csv file" a zip file will download. In it, in file "COVID19Test_geoRegion_PCR_Antigen.csv" you have "entries", "entries_pos", "entries_neg" since 24/2/2020.

eguidotti commented 2 years ago

They should mirror each other indeed. I've tried to download the folder and check again, but still:

library(dplyr)

# locate the file 
file <- file.choose()

data <- read.csv(file, na.strings = "NA")
swiss <- data %>%
  filter(geoRegion == "CH") %>%
  group_by(datum) %>%
  summarise(tests = sum(sumTotal)) %>%
  arrange(datum) %>%
  na.omit()

min(swiss$datum) # -> gives "2020-05-23"

Note 1: here I'm using sumTotal instead of "entries", "entries_pos", "entries_neg" but also entries seem to be missing before 2020-05-23. Maybe you are using data before 2020-05-23 only for some specific cantons?

Note 2: This repository is actially pulling the data for Switzerland from the same link you write, but using the .json format instead of .csv or opedata.swiss (I hope they are not providing different data under the different formats!)

Marina-Antillon commented 2 years ago

Oh! I am using geoRegion == "CHFL" (Switzerland and Liechtenstein), which for my analysis is fine, as Liechstenstein is very small, but perhaps for the larger database I understand you might not want to use that.

eguidotti commented 2 years ago

Yes, Liechtenstein is treated independently from Switzerland in this repo. Anyway,thanks for checking this out!