epiforecasts / covidregionaldata

An interface to subnational and national level COVID-19 data. For all countries supported, this includes a daily time-series of cases. Wherever available we also provide data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as linelist data and links to intervention data sets.
https://epiforecasts.io/covidregionaldata/
Other
37 stars 18 forks source link

South Africa new reported cases around higher than Our World In Data/WHO #474

Open bquilty25 opened 6 months ago

bquilty25 commented 6 months ago

I have been using this package to plot the waves in South Africa from 2020-2023. When updating the figure I noticed that the number of new cases reported each day was substantially higher than it was previously, and compared to data downloaded from Our World In Data:

sa_cases

github-actions[bot] commented 6 months ago

Thanks for opening an issue! We'll try and get back to you shortly. If you've identified an issue and would like to fix it please see our contribution guidelines.

seabbs commented 6 months ago

Thanks Billy. As you may have noticed covidregionaldata hasn't had a patch in a fair while so issues may have crept in. Can you post a reprex of what you are doing to get data? In particular what data source are you using for SA?

bquilty25 commented 6 months ago

Hey Sam,

It was with default arguments, so I guess WHO (see below). I haven't tried downloading from source separately yet (https://covid19.who.int/WHO-COVID-19-global-data.csv), but just wanted to flag.

sa_dat <- get_national_data(countries = "South Africa") 

sa_plot <- read_csv("data/owid-covid-data.csv")%>%
  filter(iso_code == "ZAF") %>%
  mutate(date = as.Date(date)) %>%
  filter(date <= as.Date("2023-01-01")) %>% 
  ggplot(aes(x = date, y = new_cases))+
  geom_col(alpha = 0.75, aes(fill = "OWID/WHO"))+
  geom_point(data = sa_dat %>% 
               mutate(date = as.Date(date)) %>%
               filter(date <= as.Date("2023-01-01")),
             aes(x = date,y = cases_new, fill = "covidregionaldata"))+
  labs(x = "",y = "Daily reported cases")+
  scale_fill_brewer(palette = "Set2")

ggsave("results/sa_cases.png", width=200, height=100, units="mm", dpi=600, bg="white")
bquilty25 commented 6 months ago

Ah looks like it may be an issue with the source data:

sa_dat <- get_national_data(countries = "South Africa")

sa_plot <- read_csv("data/owid-covid-data.csv") %>%
  filter(iso_code == "ZAF") %>%
  mutate(date = as.Date(date)) %>%
  filter(date <= as.Date("2023-01-01")) %>%
  ggplot(aes(x = date, y = new_cases)) +
  geom_col(alpha = 0.75, aes(colour = "OWID/WHO")) +
  geom_point(
    data = sa_dat %>%
      mutate(date = as.Date(date)) %>%
      filter(date <= as.Date("2023-01-01")),
    aes(x = date, y = cases_new, colour = "covidregionaldata")
  ) +
  geom_line(
    data = read_csv("https://covid19.who.int/WHO-COVID-19-global-data.csv") %>%
      filter(Country == "South Africa") %>%
      filter(Date_reported <= as.Date("2023-01-01")),
    aes(x = Date_reported, y = New_cases, colour = "WHO")
  ) +
  labs(x = "", y = "Daily reported cases") +
  scale_colour_brewer(palette = "Set2")

sa_cases