CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.12k stars 18.41k forks source link

It's Official: The US is significantly outpacing Italy #1096

Open sibblegp opened 4 years ago

sibblegp commented 4 years ago

https://covid.bio/growth

I added a heatmap which shows how frightening this is.

ghost commented 4 years ago

Waoooo,.... ! and still the people have not realized the full impact of this Virus. Well done, powerful message to share to others

JiPiBi commented 4 years ago

I was in the same idea that in France we had the same trends that in Italy with a delay of 7-8 days.

For Spain, perhaps to begin with same values you could put the beginning 2 days latter and they could be even worse?

And as Italian are in fact those who are testing more than the others I think, these values for others are probably under real ones.

If you could compare deaths growth ( didn t see?) it is also interesting

theronrr commented 4 years ago

Agree that death rates for days in would be more interesting, but also test kits in US would drive the numbers up by over 5x. It also takes approx 25 days from confirmed to recovery.

But you have days in for US starting at 3/4 instead of 1/22. And Italy starting at 2/23. This skew the data incorrectly and completely invalidate the chart

JiPiBi commented 4 years ago

For death if you prefer a ratio instead of raw values , I think it s better to calculate a ratio deaths/ population , because deaths / confirmed is not very reliable, IMHO it is only interesting for a given country that maintains same rules. For example if a person is declared confirmed and remains at home, do you add the whole family even if not tested, or not ?

mikemc commented 4 years ago

It looks like Italy is still outpacing the US in terms of deaths. Hopefully that doesn't change once US hospitals become overwhelmed.

Created on 2020-03-21 by the reprex package (v0.3.0)

R code:

library(tidyverse)
library(cowplot)
# other libraries used: lubridate, janitor

# Import data
data_path <- "~/data/COVID-19/csse_covid_19_data/csse_covid_19_time_series"
fns <- list.files(data_path, pattern = "\\.csv", full.names = TRUE)
tb <- tibble(file = fns) %>%
  mutate(
    type = str_extract(file, "[:alpha:]+(?=\\.csv)"),
    data = map(file, read_csv)
  ) %>%
  select(-file) %>%
  unnest(data)
# tidy up
master_tb <- tb %>%
  pivot_longer(
    # column names are dates in /mo/dy/yr format
    matches("[0-9]+/[0-9]+/[0-9]+"),
    names_to = "date",
    values_to = "value",
    values_ptypes = list(value = integer())
  ) %>%
  mutate_at("date", lubridate::mdy) %>%
  janitor::clean_names()

# Aggregate w/in countries
country_tb <- master_tb %>%
  group_by(type, country_region, date) %>%
  summarize_at("value", sum) %>%
  rename(country = country_region)

# Get the date each country first hit >= 100 cases
first100 <- country_tb %>%
  group_by(country) %>%
  filter(type == "Confirmed", value >= 100) %>%
  top_n(-1, date) %>%
  arrange(date)

# Get a table w/ days since first hit 100 cases for each country that did so
tb <- right_join(
  country_tb, 
  first100 %>% select(country, date.first = date, value.first = value),
  by = "country"
) %>%
  mutate(
    days_since_first_100 = date - date.first
  ) %>%
  filter(days_since_first_100 >= 0)

# Graph confirmed cases and deaths for the US and Italy
tb %>%
  filter(
    country %in% c("US", "Italy"),
    type != "Recovered"
  ) %>%
  ggplot(aes(days_since_first_100, value, color = country)) +
  geom_line() +
  facet_wrap(~type, ncol = 1, scales = "free_y") +
  # facet_wrap(~type, ncol = 1, scales = "free_y") +
  scale_color_brewer(type = "qual") +
  scale_y_log10() +
  scale_x_continuous() +
  labs(
    x = "days since >= 100 confirmed cases"
  ) +
  theme_cowplot(12)
sibblegp commented 4 years ago

Italy has the 2nd oldest population in the world. Doesn't surprise me that it's outpacing on deaths.