AntoineSoetewey / statsandr

A blog on statistics and R aiming at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R. See www.statsandr.com
http://statsandr.com/
35 stars 16 forks source link

blog/covid-19-in-belgium/ #29

Closed utterances-bot closed 3 years ago

utterances-bot commented 3 years ago

COVID-19 in Belgium - Stats and R

This article presents an analysis of the Novel COVID-19 Coronavirus in Belgium using R. Feel free to apply it to your own country

https://statsandr.com/blog/covid-19-in-belgium/

AntoineSoetewey commented 3 years ago

Comment written by Ughtie Valdez on May 10, 2020 16:00:00:

Hello Sir,

First of all want to thank you for making the code available! HERO

Next the question: How do I validate my model, or the parameters beta and gamma. What kind of statistical test is there available for SIR. Which is easy to conduct.

I have my presentation tomorrow

Comment written by Antoine Soetewey on May 10, 2020 16:05:49:

Dear Ughtie,

You can start by: 

Good luck with your presentation!

Best, 
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Ughtie Valdez on May 10, 2020 16:00:00: Hello Sir, First of all want to thank you for making the code available! HERO Next the question: How do I validate my model, or the parameters beta and gamma. What kind of statistical test is there available for SIR. Which is easy to conduct. I have my presentation tomorrow

Comment written by Antoine Soetewey on May 10, 2020 16:05:49:

Dear Ughtie,

You can start by: 

  • making sure the model converges
  • see whether the incidence predicted by the model fits well with the observed incidence

Good luck with your presentation!

Best,  Antoine

Comment written by Ughtie Valdez on May 10, 2020 16:15:04:

Dear Antoine Soetewey,

The observed incidence doesn't follow the predicted    incidence very well. Observerd follows a log trend and predicted exponential.
The model is already converged.

This was for Belgium though with an other dataset than in the example. I have done this for 11 countries.

So is there anything else I can do  ? For validation of the models?

AntoineSoetewey commented 3 years ago

Comment written by Ughtie Valdez on May 10, 2020 16:00:00: Hello Sir, First of all want to thank you for making the code available! HERO Next the question: How do I validate my model, or the parameters beta and gamma. What kind of statistical test is there available for SIR. Which is easy to conduct. I have my presentation tomorrow

Comment written by Antoine Soetewey on May 10, 2020 16:05:49: Dear Ughtie, You can start by: 

  • making sure the model converges
  • see whether the incidence predicted by the model fits well with the observed incidence

Good luck with your presentation! Best,  Antoine

Comment written by Ughtie Valdez on May 10, 2020 16:15:04:

Dear Antoine Soetewey,

The observed incidence doesn't follow the predicted    incidence very well. Observerd follows a log trend and predicted exponential. The model is already converged.

This was for Belgium though with an other dataset than in the example. I have done this for 11 countries.

So is there anything else I can do  ? For validation of the models?

Comment written by Antoine Soetewey on May 10, 2020 20:47:47:

I'm not aware of any other tools to validate this model unfortunately. Epidemiologists may be able to help you further about this.

AntoineSoetewey commented 3 years ago

Comment written by Jean claude GBADA on May 10, 2020 15:49:58: Hello Sir,   I have a problem. The package "deSolve" is not available for my version of R 3.6.3. It is there any solution I could find or may I download a new version of R?   Thanks

Comment written by Antoine Soetewey on May 10, 2020 15:56:13:

Dear Jean Claude, 

Did you try after updating R? 

Best,  Antoine

Comment written by Jean claude GBADA on May 13, 2020 15:15:30:

Yes it is Okay now, thanks!

AntoineSoetewey commented 3 years ago

Comment written by José Moniz Fernandes on April 16, 2020 01:01:45: Hi,  Other problem. Why the data isn't update?  I apply for Cabo Verde data the code https://statsandr.com/blog/covid-19-in-belgium/ SIR <- function(time, state, parameters) {    par <- as.list(c(state, parameters))    with(par, {      dS <- -beta I S / N      dI <- beta I S / N - gamma I      dR <- gamma I      list(c(dS, dI, dR))    })  } #to create a vector with the daily cumulative incidence for Belgium, from February 4 (when our daily incidence data starts)  # devtools::install_github("RamiKrispin/coronavirus")  library(coronavirus)  data(coronavirus) %&gt;% <- magrittr::%&gt;% # extract the cumulative incidence  df <- coronavirus %>%    dplyr::filter(Country.Region == "Cabo Verde") %>%    dplyr::group_by(date, type) %>%    dplyr::summarise(total = sum(cases, na.rm = TRUE)) %>%    tidyr::pivot_wider(      names_from = type,      values_from = total    ) %>%    dplyr::arrange(date) %>%    dplyr::ungroup() %>%    dplyr::mutate(active = confirmed - death - recovered) %>%    dplyr::mutate(      confirmed_cum = cumsum(confirmed),      death_cum = cumsum(death),      recovered_cum = cumsum(recovered),      active_cum = cumsum(active)    ) # put the daily cumulative incidence numbers for Belgium from  # Feb 4 to March 30 into a vector called Infected  library(lubridate)  Infected <- subset(df, date >= ymd("2020-03-20") & date <= ymd("2020-04-12"))$active_cum # Create an incrementing Day vector the same length as our  # cases vector  Day <- 1:(length(Infected)) # now specify initial values for N, S, I and R  N <- 556586  init <- c(    S = N - Infected[1],    I = Infected[1],    R = 0  ) # define a function to calculate the residual sum of squares  # (RSS), passing in parameters beta and gamma that are to be  # optimised for the best fit to the incidence data  RSS <- function(parameters) {    names(parameters) <- c("beta", "gamma")    out <- ode(y = init, times = Day, func = SIR, parms = parameters)    fit <- out[, 3]    sum((Infected - fit)^2)  } # now find the values of beta and gamma that give the  # smallest RSS, which represents the best fit to the data.  # Start with values of 0.5 for each, and constrain them to  # the interval 0 to 1.0 # install.packages("deSolve")  library(deSolve) Opt <- optim(c(0.5, 0.5),               RSS,               method = "L-BFGS-B",               lower = c(0, 0),               upper = c(1, 1)  ) # check for convergence  Opt$message Opt_par <- setNames(Opt$par, c("beta", "gamma"))  Opt_par sir_start_date <- "2020-03-20" # time in days for predictions  t <- 1:as.integer(ymd("2020-04-13") - ymd(sir_start_date)) # get the fitted values from our SIR model  fitted_cumulative_incidence <- data.frame(ode(    y = init, times = t,    func = SIR, parms = Opt_par  )) # add a Date column and the observed incidence data  library(dplyr)  fitted_cumulative_incidence <- fitted_cumulative_incidence %>%    mutate(      Date = ymd(sir_start_date) + days(t - 1),      Country = "Cabo Verde",      cumulative_incident_cases = Infected    ) # plot the data  library(ggplot2)  fitted_cumulative_incidence %>%    ggplot(aes(x = Date)) +    geom_line(aes(y = I), colour = "red") +    geom_point(aes(y = cumulative_incident_cases), colour = "blue") +    labs(      y = "Cumulative incidence",      title = "COVID-19 fitted vs observed cumulative incidence, Belgium",      subtitle = "(Red = fitted incidence from SIR model, blue = observed incidence)"    ) +    theme_minimal() fitted_cumulative_incidence %>%    ggplot(aes(x = Date)) +    geom_line(aes(y = I), colour = "red") +    geom_point(aes(y = cumulative_incident_cases), colour = "blue") +    labs(      y = "Cumulative incidence",      title = "COVID-19 fitted vs observed cumulative incidence, Belgium",      subtitle = "(Red = fitted incidence from SIR model, blue = observed incidence)"    ) +    theme_minimal() +    scale_y_log10(labels = scales::comma) #We can compute it in R:   Opt_par    R0 <- as.numeric(Opt_par[1] / Opt_par[2])    R0   # time in days for predictions    t <- 1:120   # get the fitted values from our SIR model    fitted_cumulative_incidence <- data.frame(ode(      y = init, times = t,      func = SIR, parms = Opt_par    ))   # add a Date column and join the observed incidence data    fitted_cumulative_incidence <- fitted_cumulative_incidence %>%      mutate(        Date = ymd(sir_start_date) + days(t - 1),        Country = "Cabo Verde",        cumulative_incident_cases = c(Infected, rep(NA, length(t) - length(Infected)))      )   # plot the data    fitted_cumulative_incidence %>%      ggplot(aes(x = Date)) +      geom_line(aes(y = I), colour = "red") +      geom_line(aes(y = S), colour = "black") +      geom_line(aes(y = R), colour = "green") +      geom_point(aes(y = cumulative_incident_cases),                 colour = "blue"      ) +      scale_y_continuous(labels = scales::comma) +      labs(y = "Persons", title = "COVID-19 fitted vs observed cumulative incidence, Belgium") +      scale_colour_manual(name = "", values = c(        red = "red", black = "black",        green = "green", blue = "blue"      ), labels = c(        "Susceptible",        "Recovered", "Observed incidence", "Infectious"      )) +      theme_minimal()   # plot the data    fitted_cumulative_incidence %>%      ggplot(aes(x = Date)) +      geom_line(aes(y = I, colour = "red")) +      geom_line(aes(y = S, colour = "black")) +      geom_line(aes(y = R, colour = "green")) +      geom_point(aes(y = cumulative_incident_cases, colour = "blue")) +      scale_y_log10(labels = scales::comma) +      labs(        y = "Persons",        title = "COVID-19 fitted vs observed cumulative incidence, Belgium"      ) +      scale_colour_manual(        name = "",        values = c(red = "red", black = "black", green = "green", blue = "blue"),        labels = c("Susceptible", "Observed incidence", "Recovered", "Infectious")      ) +      theme_minimal()   fit <- fitted_cumulative_incidence   # peak of pandemic    fit[fit$I == max(fit$I), c("Date", "I")]   # severe cases    max_infected <- max(fit$I)    max_infected / 5   # cases with need for intensive care    max_infected 0.06      # deaths with supposed 0.7% fatality rate    max_infected 0.007

Comment written by Antoine Soetewey on April 16, 2020 09:08:47: Dear José, To update the dataset, you need to use the update_datasets() function or reinstall the {coronavirus} package with devtools::install_github("RamiKrispin/coronavirus").  After doing this, check that you have the latest available data before running your code. Hope this helps. Regards, Antoine

Comment written by José Moniz Fernandes on April 17, 2020 00:59:20: Thanks.

Comment written by José Moniz Fernandes on April 17, 2020 01:12:08: Why # severe cases  max_infected <- max(fit$I)  max_infected / 5 or why five (5)?

Comment written by José Moniz Fernandes on April 17, 2020 01:14:21: Why   # cases with need for intensive care  max_infected * 0.06  ??

Comment written by José Moniz Fernandes on April 17, 2020 01:14:58: Why?    # deaths with supposed 0.7% fatality rate    max_infected * 0.007 ????  Where do have this number for my country?

Comment written by Antoine Soetewey on April 17, 2020 07:26:25: I'll reply to all of your comments in one comment: As written in the post, the 0.7% fatality rate comes from this source. For your country, simply take the same figure or do some research to estimate the fatality rate in your country. Regarding severe cases: dividing by 5 is like multiplying by 0.2, and 20% means that 1 out of every 5 cases is considered as severe case. 6% and 20% of infected cases for intensive care and severe cases (respectively) are estimates. As I already said in another comment, they may be different in your country. For more information where I found these estimates, see the end of this article where the author applies the analyses to Germany. As for fatality rate, if you believe that these estimates are wrong (and it can be the case for countries other than Belgium), do some research and try to find better estimates for your own country. PS: for your information, typing "????" does not add any value compared to "?", except that it looks aggressive. Regards, Antoine

Comment written by José Moniz Fernandes on April 17, 2020 14:27:12: I understood. In Cabo Verde is   # deaths with supposed 1.8% fatality rate  max_infected * 0.018. Thanks.  Where control I the first data in the first plot (summary cases), for example, the first confirmed case is 20/03/20. I don't need  before the 15 of mars.

Comment written by Antoine Soetewey on April 17, 2020 16:00:14:

You can exclude the data before March 15 with date >= ymd("2020-03-15") and edit the sir_start_date.

Comment written by José Moniz Fernandes on May 18, 2020 01:26:11:

i get the error below when i try to knitt the code

+   dplyr::mutate(country = dplyr::if_else(country == "Mainland China", "China", country)) %>% +   dplyr::mutate(country = dplyr::if_else(country == "North Macedonia", "N.Macedonia", country)) %>% +   dplyr::mutate(country = trimws(country)) %>% +   dplyr::mutate(country = factor(country, levels = country)) Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :   namespace rlang 0.4.5 is already loaded, but >= 0.4.6 is required

AntoineSoetewey commented 3 years ago

Comment written by José Moniz Fernandes on April 16, 2020 01:01:45: Hi,  Other problem. Why the data isn't update?  I apply for Cabo Verde data the code https://statsandr.com/blog/covid-19-in-belgium/ SIR <- function(time, state, parameters) {    par <- as.list(c(state, parameters))    with(par, {      dS <- -beta I S / N      dI <- beta I S / N - gamma I      dR <- gamma I      list(c(dS, dI, dR))    })  } #to create a vector with the daily cumulative incidence for Belgium, from February 4 (when our daily incidence data starts)  # devtools::install_github("RamiKrispin/coronavirus")  library(coronavirus)  data(coronavirus) %&gt;% <- magrittr::%&gt;% # extract the cumulative incidence  df <- coronavirus %>%    dplyr::filter(Country.Region == "Cabo Verde") %>%    dplyr::group_by(date, type) %>%    dplyr::summarise(total = sum(cases, na.rm = TRUE)) %>%    tidyr::pivot_wider(      names_from = type,      values_from = total    ) %>%    dplyr::arrange(date) %>%    dplyr::ungroup() %>%    dplyr::mutate(active = confirmed - death - recovered) %>%    dplyr::mutate(      confirmed_cum = cumsum(confirmed),      death_cum = cumsum(death),      recovered_cum = cumsum(recovered),      active_cum = cumsum(active)    ) # put the daily cumulative incidence numbers for Belgium from  # Feb 4 to March 30 into a vector called Infected  library(lubridate)  Infected <- subset(df, date >= ymd("2020-03-20") & date <= ymd("2020-04-12"))$active_cum # Create an incrementing Day vector the same length as our  # cases vector  Day <- 1:(length(Infected)) # now specify initial values for N, S, I and R  N <- 556586  init <- c(    S = N - Infected[1],    I = Infected[1],    R = 0  ) # define a function to calculate the residual sum of squares  # (RSS), passing in parameters beta and gamma that are to be  # optimised for the best fit to the incidence data  RSS <- function(parameters) {    names(parameters) <- c("beta", "gamma")    out <- ode(y = init, times = Day, func = SIR, parms = parameters)    fit <- out[, 3]    sum((Infected - fit)^2)  } # now find the values of beta and gamma that give the  # smallest RSS, which represents the best fit to the data.  # Start with values of 0.5 for each, and constrain them to  # the interval 0 to 1.0 # install.packages("deSolve")  library(deSolve) Opt <- optim(c(0.5, 0.5),               RSS,               method = "L-BFGS-B",               lower = c(0, 0),               upper = c(1, 1)  ) # check for convergence  Opt$message Opt_par <- setNames(Opt$par, c("beta", "gamma"))  Opt_par sir_start_date <- "2020-03-20" # time in days for predictions  t <- 1:as.integer(ymd("2020-04-13") - ymd(sir_start_date)) # get the fitted values from our SIR model  fitted_cumulative_incidence <- data.frame(ode(    y = init, times = t,    func = SIR, parms = Opt_par  )) # add a Date column and the observed incidence data  library(dplyr)  fitted_cumulative_incidence <- fitted_cumulative_incidence %>%    mutate(      Date = ymd(sir_start_date) + days(t - 1),      Country = "Cabo Verde",      cumulative_incident_cases = Infected    ) # plot the data  library(ggplot2)  fitted_cumulative_incidence %>%    ggplot(aes(x = Date)) +    geom_line(aes(y = I), colour = "red") +    geom_point(aes(y = cumulative_incident_cases), colour = "blue") +    labs(      y = "Cumulative incidence",      title = "COVID-19 fitted vs observed cumulative incidence, Belgium",      subtitle = "(Red = fitted incidence from SIR model, blue = observed incidence)"    ) +    theme_minimal() fitted_cumulative_incidence %>%    ggplot(aes(x = Date)) +    geom_line(aes(y = I), colour = "red") +    geom_point(aes(y = cumulative_incident_cases), colour = "blue") +    labs(      y = "Cumulative incidence",      title = "COVID-19 fitted vs observed cumulative incidence, Belgium",      subtitle = "(Red = fitted incidence from SIR model, blue = observed incidence)"    ) +    theme_minimal() +    scale_y_log10(labels = scales::comma) #We can compute it in R:   Opt_par    R0 <- as.numeric(Opt_par[1] / Opt_par[2])    R0   # time in days for predictions    t <- 1:120   # get the fitted values from our SIR model    fitted_cumulative_incidence <- data.frame(ode(      y = init, times = t,      func = SIR, parms = Opt_par    ))   # add a Date column and join the observed incidence data    fitted_cumulative_incidence <- fitted_cumulative_incidence %>%      mutate(        Date = ymd(sir_start_date) + days(t - 1),        Country = "Cabo Verde",        cumulative_incident_cases = c(Infected, rep(NA, length(t) - length(Infected)))      )   # plot the data    fitted_cumulative_incidence %>%      ggplot(aes(x = Date)) +      geom_line(aes(y = I), colour = "red") +      geom_line(aes(y = S), colour = "black") +      geom_line(aes(y = R), colour = "green") +      geom_point(aes(y = cumulative_incident_cases),                 colour = "blue"      ) +      scale_y_continuous(labels = scales::comma) +      labs(y = "Persons", title = "COVID-19 fitted vs observed cumulative incidence, Belgium") +      scale_colour_manual(name = "", values = c(        red = "red", black = "black",        green = "green", blue = "blue"      ), labels = c(        "Susceptible",        "Recovered", "Observed incidence", "Infectious"      )) +      theme_minimal()   # plot the data    fitted_cumulative_incidence %>%      ggplot(aes(x = Date)) +      geom_line(aes(y = I, colour = "red")) +      geom_line(aes(y = S, colour = "black")) +      geom_line(aes(y = R, colour = "green")) +      geom_point(aes(y = cumulative_incident_cases, colour = "blue")) +      scale_y_log10(labels = scales::comma) +      labs(        y = "Persons",        title = "COVID-19 fitted vs observed cumulative incidence, Belgium"      ) +      scale_colour_manual(        name = "",        values = c(red = "red", black = "black", green = "green", blue = "blue"),        labels = c("Susceptible", "Observed incidence", "Recovered", "Infectious")      ) +      theme_minimal()   fit <- fitted_cumulative_incidence   # peak of pandemic    fit[fit$I == max(fit$I), c("Date", "I")]   # severe cases    max_infected <- max(fit$I)    max_infected / 5   # cases with need for intensive care    max_infected 0.06      # deaths with supposed 0.7% fatality rate    max_infected 0.007

Comment written by Antoine Soetewey on April 16, 2020 09:08:47: Dear José, To update the dataset, you need to use the update_datasets() function or reinstall the {coronavirus} package with devtools::install_github("RamiKrispin/coronavirus").  After doing this, check that you have the latest available data before running your code. Hope this helps. Regards, Antoine

Comment written by José Moniz Fernandes on April 17, 2020 00:59:20: Thanks.

Comment written by José Moniz Fernandes on April 17, 2020 01:12:08: Why # severe cases  max_infected <- max(fit$I)  max_infected / 5 or why five (5)?

Comment written by José Moniz Fernandes on April 17, 2020 01:14:21: Why   # cases with need for intensive care  max_infected * 0.06  ??

Comment written by José Moniz Fernandes on April 17, 2020 01:14:58: Why?    # deaths with supposed 0.7% fatality rate    max_infected * 0.007 ????  Where do have this number for my country?

Comment written by Antoine Soetewey on April 17, 2020 07:26:25: I'll reply to all of your comments in one comment: As written in the post, the 0.7% fatality rate comes from this source. For your country, simply take the same figure or do some research to estimate the fatality rate in your country. Regarding severe cases: dividing by 5 is like multiplying by 0.2, and 20% means that 1 out of every 5 cases is considered as severe case. 6% and 20% of infected cases for intensive care and severe cases (respectively) are estimates. As I already said in another comment, they may be different in your country. For more information where I found these estimates, see the end of this article where the author applies the analyses to Germany. As for fatality rate, if you believe that these estimates are wrong (and it can be the case for countries other than Belgium), do some research and try to find better estimates for your own country. PS: for your information, typing "????" does not add any value compared to "?", except that it looks aggressive. Regards, Antoine

Comment written by José Moniz Fernandes on April 17, 2020 14:27:12: I understood. In Cabo Verde is   # deaths with supposed 1.8% fatality rate  max_infected * 0.018. Thanks.  Where control I the first data in the first plot (summary cases), for example, the first confirmed case is 20/03/20. I don't need  before the 15 of mars.

Comment written by Antoine Soetewey on April 17, 2020 16:00:14: You can exclude the data before March 15 with date >= ymd("2020-03-15") and edit the sir_start_date.

Comment written by José Moniz Fernandes on May 18, 2020 01:26:11:

i get the error below when i try to knitt the code

+   dplyr::mutate(country = dplyr::if_else(country == "Mainland China", "China", country)) %>% +   dplyr::mutate(country = dplyr::if_else(country == "North Macedonia", "N.Macedonia", country)) %>% +   dplyr::mutate(country = trimws(country)) %>% +   dplyr::mutate(country = factor(country, levels = country)) Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :   namespace rlang 0.4.5 is already loaded, but >= 0.4.6 is required

Comment written by Antoine Soetewey on May 18, 2020 05:18:09:

Dear José,

Try again after having updated the {rlang} package:
install.packages("rlang") library(rlang)

and let me know.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17:

Good day Antoine,

I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17:

Good day Antoine,

I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03:

Dear Benjamin,

Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with:
devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus)

Hope this helps.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17: Good day Antoine, I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03:

Dear Benjamin,

Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with: devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus)

Hope this helps.

Best, Antoine

Comment written by olumide benjamin on May 18, 2020 17:39:02:

Thanks Antoine, it works.

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17: Good day Antoine, I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03: Dear Benjamin, Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with: devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus) Hope this helps. Best, Antoine

Comment written by olumide benjamin on May 18, 2020 17:39:02:

Thanks Antoine, it works.

Comment written by olumide benjamin on May 18, 2020 23:07:21:

Dear Antoine,

Please can you help me with code on SEIR and SPQEIR on R or resources that can help me.

Regards,
Benjamin

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17: Good day Antoine, I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03: Dear Benjamin, Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with: devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus) Hope this helps. Best, Antoine

Comment written by olumide benjamin on May 18, 2020 17:39:02: Thanks Antoine, it works.

Comment written by olumide benjamin on May 18, 2020 23:07:21:

Dear Antoine,

Please can you help me with code on SEIR and SPQEIR on R or resources that can help me.

Regards, Benjamin

Comment written by Antoine Soetewey on May 19, 2020 06:00:21:

Dear Benjamin,

Hope this helps.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Michal Krzus on May 25, 2020 23:02:06:

Hi I have a problem with do this plots for Poland :(  I have an issues with optim function. parameters beta and gamma don't work for Poland-data.  The output is 0.5 0.5  instead of diffretent values and later on the graphs the SIR models are equal to 0.  And instead of getting this: I tutaj: Opt <- optim(c(0.5, 0.5),   RSS,   method = "L-BFGS-B",   lower = c(0, 0),   upper = c(1, 1) ) # check for convergence Opt$message ## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" Opt_par <- setNames(Opt$par, c("beta", "gamma")) Opt_par  

I'm getting this error "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL". Can someone please help me? Thanks in advance :)

AntoineSoetewey commented 3 years ago

Comment written by Michal Krzus on May 25, 2020 23:02:06:

Hi I have a problem with do this plots for Poland :(  I have an issues with optim function. parameters beta and gamma don't work for Poland-data.  The output is 0.5 0.5  instead of diffretent values and later on the graphs the SIR models are equal to 0.  And instead of getting this: I tutaj: Opt <- optim(c(0.5, 0.5),   RSS,   method = "L-BFGS-B",   lower = c(0, 0),   upper = c(1, 1) ) # check for convergence Opt$message ## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" Opt_par <- setNames(Opt$par, c("beta", "gamma")) Opt_par  

I'm getting this error "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL". Can someone please help me? Thanks in advance :)

Comment written by Antoine Soetewey on May 26, 2020 06:30:39:

Dear Michal,

Try again by changing the initial values for beta and gamma (the values 0.5) and/or the upper constraints (the values 1) and see if it converges. You can also change the constraints or the method in the optim() function. (Run ?optim to see other optimization methods.)

The fact that you may find different estimates depending on the initial values proves that the fitting process is not stable. Here is a potential solution.

Hope this helps.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Imad MS on June 03, 2020 17:49:57: 

Why did you report cumulative incidence in the first graphs (with available) data, then in figure without public health intervention you demonstrate the personal number on Y-axis. What is the difference between the two cases?

AntoineSoetewey commented 3 years ago

Comment written by Imad MS on June 03, 2020 17:49:57: 

Why did you report cumulative incidence in the first graphs (with available) data, then in figure without public health intervention you demonstrate the personal number on Y-axis. What is the difference between the two cases?

Comment written by Antoine Soetewey on June 03, 2020 18:00:41:

Dear Imad,

Good question.

What is needed are currently infected persons (cumulative infected minus the removed, i.e. recovered or dead). However, numbers of recovered persons are hard to obtain and probably underestimated due to underreporting bias. I thus consider the cumulative number of infected people in the first graphs (which is not an issue since the number of recovered cases is probably negligible at the time of the analysis).

Regarding the figure without public intervention: as you can see the blue points still represent the cumulative incidence. However, I wrote Persons on the y-axis mainly because the 3 fitted curves (susceptible, recovered and infectious cases predicted by the model) represent the current number of cases, and not the cumulative number of cases.

Hope this helps.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Aniqa on May 09, 2020 09:01:38:

Dear Antoine,

Hope this message finds you well. I am from Bangladesh and was trying to do a coronavirus infection projection for my country using the data we have. I landed on your blog. Thanks, it was really helpful.

I have queries regarding your codes. While using the optim function, I found different values for beta and gamma. I think these estimates are not right. They seem to be local maxima not global. When I used initial values 0.5, 0.5 this did not even converge. Then I tried different seeds, and found different estimates. Could you please help, what did I do wrong?

Thanks in advance for your help.

Best,  Aniqa

Comment written by Antoine Soetewey on June 04, 2020 18:23:41:

Dear Aniqa,

You're right these estimates are local and not global, so estimates depend on initial values.

If it didn't converge, try again by changing the initial values for beta and gamma and see if it converges. You can also change the constraints or the method in the optim() function. The fact that you may find different estimates depending on the initial values proves that the fitting process is not stable. Here is a potential solution.

Hope this helps.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Thang HOANG on June 18, 2020 11:34:51:

Hi Antoine,

The output I in the SIR model is not cumulative incidence because infected persons will move to the recovery compartment. Looking at day 56, the number of infected persons is 10223, but the number of recovered persons is up to 25655. I think it does not make sense to compare fitted I in the SIR model with the observed cumulative incidences.  Please correct me if I am wrong.

AntoineSoetewey commented 3 years ago

Comment written by Thang HOANG on June 18, 2020 11:34:51:

Hi Antoine,

The output I in the SIR model is not cumulative incidence because infected persons will move to the recovery compartment. Looking at day 56, the number of infected persons is 10223, but the number of recovered persons is up to 25655. I think it does not make sense to compare fitted I in the SIR model with the observed cumulative incidences.  Please correct me if I am wrong.

Comment written by Antoine Soetewey on June 18, 2020 13:13:06:

Dear Thang,

Good question.

What is needed are currently infected persons (cumulative infected minus the removed, i.e. recovered or dead), as you correctly wrote. However, numbers of recovered persons are hard to obtain and probably underestimated due to underreporting bias. I thus consider the cumulative number of infected people, which is probably not an issue here since the number of recovered cases is negligible at the time of the analysis (March 31st).

Hope this helps.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17: Good day Antoine, I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03: Dear Benjamin, Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with: devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus) Hope this helps. Best, Antoine

Comment written by olumide benjamin on May 18, 2020 17:39:02: Thanks Antoine, it works.

Comment written by olumide benjamin on May 18, 2020 23:07:21: Dear Antoine, Please can you help me with code on SEIR and SPQEIR on R or resources that can help me. Regards, Benjamin

Comment written by Antoine Soetewey on May 19, 2020 06:00:21:

Dear Benjamin,

Hope this helps.

Best, Antoine

Comment written by olumide benjamin on June 24, 2020 21:42:00:

Hi Antoine,

Is there any tuning i need to do as the predicted cases is extremely high for instance, this time next month the model predicted 34611 infected cases but presently we 21371 confirmed cases and 13500 active cases.

Regards,
Olumide

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17: Good day Antoine, I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03: Dear Benjamin, Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with: devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus) Hope this helps. Best, Antoine

Comment written by olumide benjamin on May 18, 2020 17:39:02: Thanks Antoine, it works.

Comment written by olumide benjamin on May 18, 2020 23:07:21: Dear Antoine, Please can you help me with code on SEIR and SPQEIR on R or resources that can help me. Regards, Benjamin

Comment written by Antoine Soetewey on May 19, 2020 06:00:21: Dear Benjamin,

Hope this helps. Best, Antoine

Comment written by olumide benjamin on June 24, 2020 21:42:00:

Hi Antoine,

Is there any tuning i need to do as the predicted cases is extremely high for instance, this time next month the model predicted 34611 infected cases but presently we 21371 confirmed cases and 13500 active cases.

Regards, Olumide

Comment written by olumide benjamin on June 24, 2020 21:43:30:

I mean 346111 infected cases

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17: Good day Antoine, I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03: Dear Benjamin, Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with: devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus) Hope this helps. Best, Antoine

Comment written by olumide benjamin on May 18, 2020 17:39:02: Thanks Antoine, it works.

Comment written by olumide benjamin on May 18, 2020 23:07:21: Dear Antoine, Please can you help me with code on SEIR and SPQEIR on R or resources that can help me. Regards, Benjamin

Comment written by Antoine Soetewey on May 19, 2020 06:00:21: Dear Benjamin,

Hope this helps. Best, Antoine

Comment written by olumide benjamin on June 24, 2020 21:42:00: Hi Antoine, Is there any tuning i need to do as the predicted cases is extremely high for instance, this time next month the model predicted 34611 infected cases but presently we 21371 confirmed cases and 13500 active cases. Regards, Olumide

Comment written by olumide benjamin on June 24, 2020 21:43:30:

I mean 346111 infected cases

Comment written by Antoine Soetewey on June 24, 2020 23:15:31:

Hello,

Does the model fit well past data? You can check this by comparing the observed with the predicted cases.

If the model does not fit well, it's most likely that your beta and gamma estimates are not good. Try changing the initial values or the optimizing method.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17: Good day Antoine, I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03: Dear Benjamin, Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with: devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus) Hope this helps. Best, Antoine

Comment written by olumide benjamin on May 18, 2020 17:39:02: Thanks Antoine, it works.

Comment written by olumide benjamin on May 18, 2020 23:07:21: Dear Antoine, Please can you help me with code on SEIR and SPQEIR on R or resources that can help me. Regards, Benjamin

Comment written by Antoine Soetewey on May 19, 2020 06:00:21: Dear Benjamin,

Hope this helps. Best, Antoine

Comment written by olumide benjamin on June 24, 2020 21:42:00: Hi Antoine, Is there any tuning i need to do as the predicted cases is extremely high for instance, this time next month the model predicted 34611 infected cases but presently we 21371 confirmed cases and 13500 active cases. Regards, Olumide

Comment written by olumide benjamin on June 24, 2020 21:43:30: I mean 346111 infected cases

Comment written by Antoine Soetewey on June 24, 2020 23:15:31:

Hello,

Does the model fit well past data? You can check this by comparing the observed with the predicted cases.

If the model does not fit well, it's most likely that your beta and gamma estimates are not good. Try changing the initial values or the optimizing method.

Best, Antoine

Comment written by olumide benjamin on June 25, 2020 10:46:42:

Hi Antoine,

The model first well with the previous data, also I ran it on Facebook prophet and the result is the same.

Regards,
Olumide

AntoineSoetewey commented 3 years ago

Comment written by olumide benjamin on May 18, 2020 13:59:17: Good day Antoine, I am applying it to Nigeria but got this error Error: object 'confirmed' not found  and tried it for Belgium but got Error: object 'death' not found.  I tried update_dataset(), but got package not found error. Thanks for your help.

Comment written by Antoine Soetewey on May 18, 2020 14:03:03: Dear Benjamin, Are you sure you correctly installed and loaded the {coronavirus} package? You can do it with: devtools::install_github("RamiKrispin/coronavirus", force = TRUE) library(coronavirus) data(coronavirus) Hope this helps. Best, Antoine

Comment written by olumide benjamin on May 18, 2020 17:39:02: Thanks Antoine, it works.

Comment written by olumide benjamin on May 18, 2020 23:07:21: Dear Antoine, Please can you help me with code on SEIR and SPQEIR on R or resources that can help me. Regards, Benjamin

Comment written by Antoine Soetewey on May 19, 2020 06:00:21: Dear Benjamin,

Hope this helps. Best, Antoine

Comment written by olumide benjamin on June 24, 2020 21:42:00: Hi Antoine, Is there any tuning i need to do as the predicted cases is extremely high for instance, this time next month the model predicted 34611 infected cases but presently we 21371 confirmed cases and 13500 active cases. Regards, Olumide

Comment written by olumide benjamin on June 24, 2020 21:43:30: I mean 346111 infected cases

Comment written by Antoine Soetewey on June 24, 2020 23:15:31: Hello, Does the model fit well past data? You can check this by comparing the observed with the predicted cases. If the model does not fit well, it's most likely that your beta and gamma estimates are not good. Try changing the initial values or the optimizing method. Best, Antoine

Comment written by olumide benjamin on June 25, 2020 10:46:42:

Hi Antoine,

The model first well with the previous data, also I ran it on Facebook prophet and the result is the same.

Regards, Olumide

Comment written by Antoine Soetewey on June 25, 2020 16:52:40:

Just to make sure, are you sure you didn't predict cumulative incidence instead of observed incidence?

AntoineSoetewey commented 3 years ago

Comment written by José Moniz Fernandes on April 16, 2020 01:01:45: Hi,  Other problem. Why the data isn't update?  I apply for Cabo Verde data the code https://statsandr.com/blog/covid-19-in-belgium/ SIR <- function(time, state, parameters) {    par <- as.list(c(state, parameters))    with(par, {      dS <- -beta I S / N      dI <- beta I S / N - gamma I      dR <- gamma I      list(c(dS, dI, dR))    })  } #to create a vector with the daily cumulative incidence for Belgium, from February 4 (when our daily incidence data starts)  # devtools::install_github("RamiKrispin/coronavirus")  library(coronavirus)  data(coronavirus) %&gt;% <- magrittr::%&gt;% # extract the cumulative incidence  df <- coronavirus %>%    dplyr::filter(Country.Region == "Cabo Verde") %>%    dplyr::group_by(date, type) %>%    dplyr::summarise(total = sum(cases, na.rm = TRUE)) %>%    tidyr::pivot_wider(      names_from = type,      values_from = total    ) %>%    dplyr::arrange(date) %>%    dplyr::ungroup() %>%    dplyr::mutate(active = confirmed - death - recovered) %>%    dplyr::mutate(      confirmed_cum = cumsum(confirmed),      death_cum = cumsum(death),      recovered_cum = cumsum(recovered),      active_cum = cumsum(active)    ) # put the daily cumulative incidence numbers for Belgium from  # Feb 4 to March 30 into a vector called Infected  library(lubridate)  Infected <- subset(df, date >= ymd("2020-03-20") & date <= ymd("2020-04-12"))$active_cum # Create an incrementing Day vector the same length as our  # cases vector  Day <- 1:(length(Infected)) # now specify initial values for N, S, I and R  N <- 556586  init <- c(    S = N - Infected[1],    I = Infected[1],    R = 0  ) # define a function to calculate the residual sum of squares  # (RSS), passing in parameters beta and gamma that are to be  # optimised for the best fit to the incidence data  RSS <- function(parameters) {    names(parameters) <- c("beta", "gamma")    out <- ode(y = init, times = Day, func = SIR, parms = parameters)    fit <- out[, 3]    sum((Infected - fit)^2)  } # now find the values of beta and gamma that give the  # smallest RSS, which represents the best fit to the data.  # Start with values of 0.5 for each, and constrain them to  # the interval 0 to 1.0 # install.packages("deSolve")  library(deSolve) Opt <- optim(c(0.5, 0.5),               RSS,               method = "L-BFGS-B",               lower = c(0, 0),               upper = c(1, 1)  ) # check for convergence  Opt$message Opt_par <- setNames(Opt$par, c("beta", "gamma"))  Opt_par sir_start_date <- "2020-03-20" # time in days for predictions  t <- 1:as.integer(ymd("2020-04-13") - ymd(sir_start_date)) # get the fitted values from our SIR model  fitted_cumulative_incidence <- data.frame(ode(    y = init, times = t,    func = SIR, parms = Opt_par  )) # add a Date column and the observed incidence data  library(dplyr)  fitted_cumulative_incidence <- fitted_cumulative_incidence %>%    mutate(      Date = ymd(sir_start_date) + days(t - 1),      Country = "Cabo Verde",      cumulative_incident_cases = Infected    ) # plot the data  library(ggplot2)  fitted_cumulative_incidence %>%    ggplot(aes(x = Date)) +    geom_line(aes(y = I), colour = "red") +    geom_point(aes(y = cumulative_incident_cases), colour = "blue") +    labs(      y = "Cumulative incidence",      title = "COVID-19 fitted vs observed cumulative incidence, Belgium",      subtitle = "(Red = fitted incidence from SIR model, blue = observed incidence)"    ) +    theme_minimal() fitted_cumulative_incidence %>%    ggplot(aes(x = Date)) +    geom_line(aes(y = I), colour = "red") +    geom_point(aes(y = cumulative_incident_cases), colour = "blue") +    labs(      y = "Cumulative incidence",      title = "COVID-19 fitted vs observed cumulative incidence, Belgium",      subtitle = "(Red = fitted incidence from SIR model, blue = observed incidence)"    ) +    theme_minimal() +    scale_y_log10(labels = scales::comma) #We can compute it in R:   Opt_par    R0 <- as.numeric(Opt_par[1] / Opt_par[2])    R0   # time in days for predictions    t <- 1:120   # get the fitted values from our SIR model    fitted_cumulative_incidence <- data.frame(ode(      y = init, times = t,      func = SIR, parms = Opt_par    ))   # add a Date column and join the observed incidence data    fitted_cumulative_incidence <- fitted_cumulative_incidence %>%      mutate(        Date = ymd(sir_start_date) + days(t - 1),        Country = "Cabo Verde",        cumulative_incident_cases = c(Infected, rep(NA, length(t) - length(Infected)))      )   # plot the data    fitted_cumulative_incidence %>%      ggplot(aes(x = Date)) +      geom_line(aes(y = I), colour = "red") +      geom_line(aes(y = S), colour = "black") +      geom_line(aes(y = R), colour = "green") +      geom_point(aes(y = cumulative_incident_cases),                 colour = "blue"      ) +      scale_y_continuous(labels = scales::comma) +      labs(y = "Persons", title = "COVID-19 fitted vs observed cumulative incidence, Belgium") +      scale_colour_manual(name = "", values = c(        red = "red", black = "black",        green = "green", blue = "blue"      ), labels = c(        "Susceptible",        "Recovered", "Observed incidence", "Infectious"      )) +      theme_minimal()   # plot the data    fitted_cumulative_incidence %>%      ggplot(aes(x = Date)) +      geom_line(aes(y = I, colour = "red")) +      geom_line(aes(y = S, colour = "black")) +      geom_line(aes(y = R, colour = "green")) +      geom_point(aes(y = cumulative_incident_cases, colour = "blue")) +      scale_y_log10(labels = scales::comma) +      labs(        y = "Persons",        title = "COVID-19 fitted vs observed cumulative incidence, Belgium"      ) +      scale_colour_manual(        name = "",        values = c(red = "red", black = "black", green = "green", blue = "blue"),        labels = c("Susceptible", "Observed incidence", "Recovered", "Infectious")      ) +      theme_minimal()   fit <- fitted_cumulative_incidence   # peak of pandemic    fit[fit$I == max(fit$I), c("Date", "I")]   # severe cases    max_infected <- max(fit$I)    max_infected / 5   # cases with need for intensive care    max_infected 0.06      # deaths with supposed 0.7% fatality rate    max_infected 0.007

Comment written by Antoine Soetewey on April 16, 2020 09:08:47: Dear José, To update the dataset, you need to use the update_datasets() function or reinstall the {coronavirus} package with devtools::install_github("RamiKrispin/coronavirus").  After doing this, check that you have the latest available data before running your code. Hope this helps. Regards, Antoine

Comment written by José Moniz Fernandes on April 17, 2020 00:59:20: Thanks.

Comment written by José Moniz Fernandes on April 17, 2020 01:12:08: Why # severe cases  max_infected <- max(fit$I)  max_infected / 5 or why five (5)?

Comment written by José Moniz Fernandes on April 17, 2020 01:14:21: Why   # cases with need for intensive care  max_infected * 0.06  ??

Comment written by José Moniz Fernandes on April 17, 2020 01:14:58: Why?    # deaths with supposed 0.7% fatality rate    max_infected * 0.007 ????  Where do have this number for my country?

Comment written by Antoine Soetewey on April 17, 2020 07:26:25: I'll reply to all of your comments in one comment: As written in the post, the 0.7% fatality rate comes from this source. For your country, simply take the same figure or do some research to estimate the fatality rate in your country. Regarding severe cases: dividing by 5 is like multiplying by 0.2, and 20% means that 1 out of every 5 cases is considered as severe case. 6% and 20% of infected cases for intensive care and severe cases (respectively) are estimates. As I already said in another comment, they may be different in your country. For more information where I found these estimates, see the end of this article where the author applies the analyses to Germany. As for fatality rate, if you believe that these estimates are wrong (and it can be the case for countries other than Belgium), do some research and try to find better estimates for your own country. PS: for your information, typing "????" does not add any value compared to "?", except that it looks aggressive. Regards, Antoine

Comment written by José Moniz Fernandes on April 17, 2020 14:27:12: I understood. In Cabo Verde is   # deaths with supposed 1.8% fatality rate  max_infected * 0.018. Thanks.  Where control I the first data in the first plot (summary cases), for example, the first confirmed case is 20/03/20. I don't need  before the 15 of mars.

Comment written by Antoine Soetewey on April 17, 2020 16:00:14: You can exclude the data before March 15 with date >= ymd("2020-03-15") and edit the sir_start_date.

Comment written by José Moniz Fernandes on May 18, 2020 01:26:11: i get the error below when i try to knitt the code +   dplyr::mutate(country = dplyr::if_else(country == "Mainland China", "China", country)) %>% +   dplyr::mutate(country = dplyr::if_else(country == "North Macedonia", "N.Macedonia", country)) %>% +   dplyr::mutate(country = trimws(country)) %>% +   dplyr::mutate(country = factor(country, levels = country)) Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :   namespace rlang 0.4.5 is already loaded, but >= 0.4.6 is required

Comment written by Antoine Soetewey on May 18, 2020 05:18:09:

Dear José,

Try again after having updated the {rlang} package: install.packages("rlang") library(rlang)

and let me know.

Best, Antoine

Comment written by José Moniz Fernandes on July 02, 2020 23:28:58:

Dear,

I have this error
Error: .onLoad failed in loadNamespace() for 'pkgload', details:   call: NULL   error: package backports does not have a namespace

AntoineSoetewey commented 3 years ago

Comment written by José Moniz Fernandes on April 16, 2020 01:01:45: Hi,  Other problem. Why the data isn't update?  I apply for Cabo Verde data the code https://statsandr.com/blog/covid-19-in-belgium/ SIR <- function(time, state, parameters) {    par <- as.list(c(state, parameters))    with(par, {      dS <- -beta I S / N      dI <- beta I S / N - gamma I      dR <- gamma I      list(c(dS, dI, dR))    })  } #to create a vector with the daily cumulative incidence for Belgium, from February 4 (when our daily incidence data starts)  # devtools::install_github("RamiKrispin/coronavirus")  library(coronavirus)  data(coronavirus) %&gt;% <- magrittr::%&gt;% # extract the cumulative incidence  df <- coronavirus %>%    dplyr::filter(Country.Region == "Cabo Verde") %>%    dplyr::group_by(date, type) %>%    dplyr::summarise(total = sum(cases, na.rm = TRUE)) %>%    tidyr::pivot_wider(      names_from = type,      values_from = total    ) %>%    dplyr::arrange(date) %>%    dplyr::ungroup() %>%    dplyr::mutate(active = confirmed - death - recovered) %>%    dplyr::mutate(      confirmed_cum = cumsum(confirmed),      death_cum = cumsum(death),      recovered_cum = cumsum(recovered),      active_cum = cumsum(active)    ) # put the daily cumulative incidence numbers for Belgium from  # Feb 4 to March 30 into a vector called Infected  library(lubridate)  Infected <- subset(df, date >= ymd("2020-03-20") & date <= ymd("2020-04-12"))$active_cum # Create an incrementing Day vector the same length as our  # cases vector  Day <- 1:(length(Infected)) # now specify initial values for N, S, I and R  N <- 556586  init <- c(    S = N - Infected[1],    I = Infected[1],    R = 0  ) # define a function to calculate the residual sum of squares  # (RSS), passing in parameters beta and gamma that are to be  # optimised for the best fit to the incidence data  RSS <- function(parameters) {    names(parameters) <- c("beta", "gamma")    out <- ode(y = init, times = Day, func = SIR, parms = parameters)    fit <- out[, 3]    sum((Infected - fit)^2)  } # now find the values of beta and gamma that give the  # smallest RSS, which represents the best fit to the data.  # Start with values of 0.5 for each, and constrain them to  # the interval 0 to 1.0 # install.packages("deSolve")  library(deSolve) Opt <- optim(c(0.5, 0.5),               RSS,               method = "L-BFGS-B",               lower = c(0, 0),               upper = c(1, 1)  ) # check for convergence  Opt$message Opt_par <- setNames(Opt$par, c("beta", "gamma"))  Opt_par sir_start_date <- "2020-03-20" # time in days for predictions  t <- 1:as.integer(ymd("2020-04-13") - ymd(sir_start_date)) # get the fitted values from our SIR model  fitted_cumulative_incidence <- data.frame(ode(    y = init, times = t,    func = SIR, parms = Opt_par  )) # add a Date column and the observed incidence data  library(dplyr)  fitted_cumulative_incidence <- fitted_cumulative_incidence %>%    mutate(      Date = ymd(sir_start_date) + days(t - 1),      Country = "Cabo Verde",      cumulative_incident_cases = Infected    ) # plot the data  library(ggplot2)  fitted_cumulative_incidence %>%    ggplot(aes(x = Date)) +    geom_line(aes(y = I), colour = "red") +    geom_point(aes(y = cumulative_incident_cases), colour = "blue") +    labs(      y = "Cumulative incidence",      title = "COVID-19 fitted vs observed cumulative incidence, Belgium",      subtitle = "(Red = fitted incidence from SIR model, blue = observed incidence)"    ) +    theme_minimal() fitted_cumulative_incidence %>%    ggplot(aes(x = Date)) +    geom_line(aes(y = I), colour = "red") +    geom_point(aes(y = cumulative_incident_cases), colour = "blue") +    labs(      y = "Cumulative incidence",      title = "COVID-19 fitted vs observed cumulative incidence, Belgium",      subtitle = "(Red = fitted incidence from SIR model, blue = observed incidence)"    ) +    theme_minimal() +    scale_y_log10(labels = scales::comma) #We can compute it in R:   Opt_par    R0 <- as.numeric(Opt_par[1] / Opt_par[2])    R0   # time in days for predictions    t <- 1:120   # get the fitted values from our SIR model    fitted_cumulative_incidence <- data.frame(ode(      y = init, times = t,      func = SIR, parms = Opt_par    ))   # add a Date column and join the observed incidence data    fitted_cumulative_incidence <- fitted_cumulative_incidence %>%      mutate(        Date = ymd(sir_start_date) + days(t - 1),        Country = "Cabo Verde",        cumulative_incident_cases = c(Infected, rep(NA, length(t) - length(Infected)))      )   # plot the data    fitted_cumulative_incidence %>%      ggplot(aes(x = Date)) +      geom_line(aes(y = I), colour = "red") +      geom_line(aes(y = S), colour = "black") +      geom_line(aes(y = R), colour = "green") +      geom_point(aes(y = cumulative_incident_cases),                 colour = "blue"      ) +      scale_y_continuous(labels = scales::comma) +      labs(y = "Persons", title = "COVID-19 fitted vs observed cumulative incidence, Belgium") +      scale_colour_manual(name = "", values = c(        red = "red", black = "black",        green = "green", blue = "blue"      ), labels = c(        "Susceptible",        "Recovered", "Observed incidence", "Infectious"      )) +      theme_minimal()   # plot the data    fitted_cumulative_incidence %>%      ggplot(aes(x = Date)) +      geom_line(aes(y = I, colour = "red")) +      geom_line(aes(y = S, colour = "black")) +      geom_line(aes(y = R, colour = "green")) +      geom_point(aes(y = cumulative_incident_cases, colour = "blue")) +      scale_y_log10(labels = scales::comma) +      labs(        y = "Persons",        title = "COVID-19 fitted vs observed cumulative incidence, Belgium"      ) +      scale_colour_manual(        name = "",        values = c(red = "red", black = "black", green = "green", blue = "blue"),        labels = c("Susceptible", "Observed incidence", "Recovered", "Infectious")      ) +      theme_minimal()   fit <- fitted_cumulative_incidence   # peak of pandemic    fit[fit$I == max(fit$I), c("Date", "I")]   # severe cases    max_infected <- max(fit$I)    max_infected / 5   # cases with need for intensive care    max_infected 0.06      # deaths with supposed 0.7% fatality rate    max_infected 0.007

Comment written by Antoine Soetewey on April 16, 2020 09:08:47: Dear José, To update the dataset, you need to use the update_datasets() function or reinstall the {coronavirus} package with devtools::install_github("RamiKrispin/coronavirus").  After doing this, check that you have the latest available data before running your code. Hope this helps. Regards, Antoine

Comment written by José Moniz Fernandes on April 17, 2020 00:59:20: Thanks.

Comment written by José Moniz Fernandes on April 17, 2020 01:12:08: Why # severe cases  max_infected <- max(fit$I)  max_infected / 5 or why five (5)?

Comment written by José Moniz Fernandes on April 17, 2020 01:14:21: Why   # cases with need for intensive care  max_infected * 0.06  ??

Comment written by José Moniz Fernandes on April 17, 2020 01:14:58: Why?    # deaths with supposed 0.7% fatality rate    max_infected * 0.007 ????  Where do have this number for my country?

Comment written by Antoine Soetewey on April 17, 2020 07:26:25: I'll reply to all of your comments in one comment: As written in the post, the 0.7% fatality rate comes from this source. For your country, simply take the same figure or do some research to estimate the fatality rate in your country. Regarding severe cases: dividing by 5 is like multiplying by 0.2, and 20% means that 1 out of every 5 cases is considered as severe case. 6% and 20% of infected cases for intensive care and severe cases (respectively) are estimates. As I already said in another comment, they may be different in your country. For more information where I found these estimates, see the end of this article where the author applies the analyses to Germany. As for fatality rate, if you believe that these estimates are wrong (and it can be the case for countries other than Belgium), do some research and try to find better estimates for your own country. PS: for your information, typing "????" does not add any value compared to "?", except that it looks aggressive. Regards, Antoine

Comment written by José Moniz Fernandes on April 17, 2020 14:27:12: I understood. In Cabo Verde is   # deaths with supposed 1.8% fatality rate  max_infected * 0.018. Thanks.  Where control I the first data in the first plot (summary cases), for example, the first confirmed case is 20/03/20. I don't need  before the 15 of mars.

Comment written by Antoine Soetewey on April 17, 2020 16:00:14: You can exclude the data before March 15 with date >= ymd("2020-03-15") and edit the sir_start_date.

Comment written by José Moniz Fernandes on May 18, 2020 01:26:11: i get the error below when i try to knitt the code +   dplyr::mutate(country = dplyr::if_else(country == "Mainland China", "China", country)) %>% +   dplyr::mutate(country = dplyr::if_else(country == "North Macedonia", "N.Macedonia", country)) %>% +   dplyr::mutate(country = trimws(country)) %>% +   dplyr::mutate(country = factor(country, levels = country)) Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :   namespace rlang 0.4.5 is already loaded, but >= 0.4.6 is required

Comment written by Antoine Soetewey on May 18, 2020 05:18:09: Dear José, Try again after having updated the {rlang} package: install.packages("rlang") library(rlang) and let me know. Best, Antoine

Comment written by José Moniz Fernandes on July 02, 2020 23:28:58:

Dear,

I have this error Error: .onLoad failed in loadNamespace() for 'pkgload', details:   call: NULL   error: package backports does not have a namespace

Comment written by Antoine Soetewey on July 03, 2020 08:39:24:

Dear José,

Never had this issue. Try by updating R and running install.packages("backports"). If this does not work, this user fixed the issue by reinstalling R.

Hope this helps.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Aniqa on May 09, 2020 09:01:38: Dear Antoine, Hope this message finds you well. I am from Bangladesh and was trying to do a coronavirus infection projection for my country using the data we have. I landed on your blog. Thanks, it was really helpful. I have queries regarding your codes. While using the optim function, I found different values for beta and gamma. I think these estimates are not right. They seem to be local maxima not global. When I used initial values 0.5, 0.5 this did not even converge. Then I tried different seeds, and found different estimates. Could you please help, what did I do wrong? Thanks in advance for your help. Best,  Aniqa

Comment written by Antoine Soetewey on June 04, 2020 18:23:41:

Dear Aniqa,

You're right these estimates are local and not global, so estimates depend on initial values.

If it didn't converge, try again by changing the initial values for beta and gamma and see if it converges. You can also change the constraints or the method in the optim() function. The fact that you may find different estimates depending on the initial values proves that the fitting process is not stable. Here is a potential solution.

Hope this helps.

Best, Antoine

Comment written by Aniqa Hossain on July 05, 2020 17:37:38:

Dear Antoine,

Hope you are doing well. Can you share a link with R codes to fit SIR model where we can take account for the other measures like lockdown, mask usage etc?

Many thanks,
Aniqa

AntoineSoetewey commented 3 years ago

Comment written by Aniqa on May 09, 2020 09:01:38: Dear Antoine, Hope this message finds you well. I am from Bangladesh and was trying to do a coronavirus infection projection for my country using the data we have. I landed on your blog. Thanks, it was really helpful. I have queries regarding your codes. While using the optim function, I found different values for beta and gamma. I think these estimates are not right. They seem to be local maxima not global. When I used initial values 0.5, 0.5 this did not even converge. Then I tried different seeds, and found different estimates. Could you please help, what did I do wrong? Thanks in advance for your help. Best,  Aniqa

Comment written by Antoine Soetewey on June 04, 2020 18:23:41: Dear Aniqa, You're right these estimates are local and not global, so estimates depend on initial values. If it didn't converge, try again by changing the initial values for beta and gamma and see if it converges. You can also change the constraints or the method in the optim() function. The fact that you may find different estimates depending on the initial values proves that the fitting process is not stable. Here is a potential solution. Hope this helps. Best, Antoine

Comment written by Aniqa Hossain on July 05, 2020 17:37:38:

Dear Antoine,

Hope you are doing well. Can you share a link with R codes to fit SIR model where we can take account for the other measures like lockdown, mask usage etc?

Many thanks, Aniqa

Comment written by Antoine Soetewey on July 05, 2020 19:42:44:

Dear Aniqa,

See these articles by Tim Churches (especially the most recent ones), they may be of interest to you.

If not, feel free to look at the collection for other articles with the code.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Gavin Coburn on July 08, 2020 00:36:46:

Dear Antoine,

I am trying to figure out what the output for the out part of the following code is (I tried to use R to find it but it isn't giving me the values):

RSS <- function(parameters) {   names(parameters) <- c("beta", "gamma1", "a")   out <- ode(y = init, times = Day, func = SICR, parms = parameters)   fit <- out[, 3]   sum((Confirmed - fit)^2) }

Why did you choose to use [,3] here? 
What does that value represent? 

Thanks,
Gavin Coburn

AntoineSoetewey commented 3 years ago

Comment written by Gavin Coburn on July 08, 2020 00:36:46:

Dear Antoine,

I am trying to figure out what the output for the out part of the following code is (I tried to use R to find it but it isn't giving me the values):

RSS <- function(parameters) {   names(parameters) <- c("beta", "gamma1", "a")   out <- ode(y = init, times = Day, func = SICR, parms = parameters)   fit <- out[, 3]   sum((Confirmed - fit)^2) }

Why did you choose to use [,3] here?  What does that value represent? 

Thanks, Gavin Coburn

Comment written by Antoine Soetewey on July 08, 2020 07:59:19:

Dear Gavin,

Here is a preview of the output of out. As you can see, with out[, 3] I simply select the third column (variable I), which is the number of infected people predicted by the model (\hat{I}(t) in this equation).

Hope this helps.

Best,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Gavin Coburn on July 08, 2020 00:36:46: Dear Antoine, I am trying to figure out what the output for the out part of the following code is (I tried to use R to find it but it isn't giving me the values): RSS <- function(parameters) {   names(parameters) <- c("beta", "gamma1", "a")   out <- ode(y = init, times = Day, func = SICR, parms = parameters)   fit <- out[, 3]   sum((Confirmed - fit)^2) } Why did you choose to use [,3] here?  What does that value represent?  Thanks, Gavin Coburn

Comment written by Antoine Soetewey on July 08, 2020 07:59:19:

Dear Gavin,

Here is a preview of the output of out. As you can see, with out[, 3] I simply select the third column (variable I), which is the number of infected people predicted by the model (\hat{I}(t) in this equation).

Hope this helps.

Best, Antoine

Comment written by Gavin Coburn on July 08, 2020 20:58:29:

That helps a lot!
Thanks!

AntoineSoetewey commented 3 years ago

Comment written by Gavin Coburn on July 08, 2020 00:36:46: Dear Antoine, I am trying to figure out what the output for the out part of the following code is (I tried to use R to find it but it isn't giving me the values): RSS <- function(parameters) {   names(parameters) <- c("beta", "gamma1", "a")   out <- ode(y = init, times = Day, func = SICR, parms = parameters)   fit <- out[, 3]   sum((Confirmed - fit)^2) } Why did you choose to use [,3] here?  What does that value represent?  Thanks, Gavin Coburn

Comment written by Antoine Soetewey on July 08, 2020 07:59:19: Dear Gavin, Here is a preview of the output of out. As you can see, with out[, 3] I simply select the third column (variable I), which is the number of infected people predicted by the model (\hat{I}(t) in this equation). Hope this helps. Best, Antoine

Comment written by Gavin Coburn on July 08, 2020 20:58:29:

That helps a lot! Thanks!

Comment written by Antoine Soetewey on July 09, 2020 13:37:24:

You're welcome!

AntoineSoetewey commented 3 years ago

Comment written by Faizan Ahmad Zargar on August 03, 2020 09:54:38:

I used the following code to estimate beta and gamma  for India:

SIR <- function(time, state, parameters) {   par <- as.list(c(state, parameters))   with(par, {     dS <- -beta I S / N     dI <- beta I S / N - gamma I     dR <- gamma I     list(c(dS, dI, dR))   }) } # devtools::install_github("RamiKrispin/coronavirus") library(coronavirus) data(coronavirus) %>% <- magrittr::%>% # extract the cumulative incidence df <- coronavirus %>%   dplyr::filter(country == "India") %>%   dplyr::group_by(date, type) %>%   dplyr::summarise(total = sum(cases, na.rm = TRUE)) %>%   tidyr::pivot_wider(     names_from = type,     values_from = total   ) %>%   dplyr::arrange(date) %>%   dplyr::ungroup() %>%   dplyr::mutate(active = confirmed - death - recovered) %>%   dplyr::mutate(     confirmed_cum = cumsum(confirmed),     death_cum = cumsum(death),     recovered_cum = cumsum(recovered),     active_cum = cumsum(active)   ) # put the daily cumulative incidence numbers for India from #  Jan 30 to Aug 2 into a vector called Infected library(lubridate) sir_start_date <- "2020-01-30" sir_end_date <- "2020-08-02" Infected <- subset(df, date >= ymd(sir_start_date) & date <= ymd(sir_end_date))$active_cum # Create an incrementing Day vector the same length as our # cases vector Day <- 1:(length(Infected)) # now specify initial values for N, S, I and R N <- 1381196835 init <- c(   S = N - Infected[1],   I = Infected[1],   R = 0 ) # define a function to calculate the residual sum of squares # (RSS), passing in parameters beta and gamma that are to be # optimised for the best fit to the incidence data RSS <- function(parameters) {   names(parameters) <- c("beta", "gamma")   out <- ode(y = init, times = Day, func = SIR, parms = parameters)   fit <- out[, 3]   sum((Infected - fit)^2) } # now find the values of beta and gamma that give the # smallest RSS, which represents the best fit to the data. # Start with values of 0.5 for each, and constrain them to # the interval 0 to 1.0 # install.packages("deSolve") library(deSolve) Opt <- optim(c(0.5, 0.5),   RSS,   method = "L-BFGS-B",   lower = c(0, 0),   upper = c(1, 1) ) # check for convergence Opt$message

when I check for convergence, it shows me an error,  "ERROR: ABNORMAL_TERMINATION_IN_LNSRCH".

However when i reduce the value of N (the total population of India) ten times, it works OK. Can anyone help me please????

AntoineSoetewey commented 3 years ago

Comment written by Faizan Ahmad Zargar on August 03, 2020 09:54:38:

I used the following code to estimate beta and gamma  for India:

SIR <- function(time, state, parameters) {   par <- as.list(c(state, parameters))   with(par, {     dS <- -beta I S / N     dI <- beta I S / N - gamma I     dR <- gamma I     list(c(dS, dI, dR))   }) } # devtools::install_github("RamiKrispin/coronavirus") library(coronavirus) data(coronavirus) %>% <- magrittr::%>% # extract the cumulative incidence df <- coronavirus %>%   dplyr::filter(country == "India") %>%   dplyr::group_by(date, type) %>%   dplyr::summarise(total = sum(cases, na.rm = TRUE)) %>%   tidyr::pivot_wider(     names_from = type,     values_from = total   ) %>%   dplyr::arrange(date) %>%   dplyr::ungroup() %>%   dplyr::mutate(active = confirmed - death - recovered) %>%   dplyr::mutate(     confirmed_cum = cumsum(confirmed),     death_cum = cumsum(death),     recovered_cum = cumsum(recovered),     active_cum = cumsum(active)   ) # put the daily cumulative incidence numbers for India from #  Jan 30 to Aug 2 into a vector called Infected library(lubridate) sir_start_date <- "2020-01-30" sir_end_date <- "2020-08-02" Infected <- subset(df, date >= ymd(sir_start_date) & date <= ymd(sir_end_date))$active_cum # Create an incrementing Day vector the same length as our # cases vector Day <- 1:(length(Infected)) # now specify initial values for N, S, I and R N <- 1381196835 init <- c(   S = N - Infected[1],   I = Infected[1],   R = 0 ) # define a function to calculate the residual sum of squares # (RSS), passing in parameters beta and gamma that are to be # optimised for the best fit to the incidence data RSS <- function(parameters) {   names(parameters) <- c("beta", "gamma")   out <- ode(y = init, times = Day, func = SIR, parms = parameters)   fit <- out[, 3]   sum((Infected - fit)^2) } # now find the values of beta and gamma that give the # smallest RSS, which represents the best fit to the data. # Start with values of 0.5 for each, and constrain them to # the interval 0 to 1.0 # install.packages("deSolve") library(deSolve) Opt <- optim(c(0.5, 0.5),   RSS,   method = "L-BFGS-B",   lower = c(0, 0),   upper = c(1, 1) ) # check for convergence Opt$message

when I check for convergence, it shows me an error,  "ERROR: ABNORMAL_TERMINATION_IN_LNSRCH".

However when i reduce the value of N (the total population of India) ten times, it works OK. Can anyone help me please????

Comment written by Antoine Soetewey on August 03, 2020 15:41:30:

Dear Faizan,

Try again by changing the initial values for beta and gamma (the values 0.5) and/or the upper constraints (the values 1) and see if it converges. You can also try by changing the method in the optim() function. (Run ?optim to see other optimization methods.) The fact that you may find different estimates depending on the initial values or the N proves that the fitting process is not stable. Here is a potential solution.

Hope this helps.

Best,
Antoine

rcsmit commented 3 years ago

Hello. Don't you mix up the two R0? I did the same, and it took me a lot of time See here

AntoineSoetewey commented 3 years ago

Hello. Don't you mix up the two R0? I did the same, and it took me a lot of time See here

Dear Rene,

Just to make sure I understand your point correctly:

So those three concepts are different.

Hope this helps.

Regards, Antoine

rcsmit commented 3 years ago

Hello

Thank you for your comment. If I understand well:

R_e on time=0 is also called R_0.

R_e = R_0 * (S/N)

So we might have to distinguish the former rho ('relative removal rate' by Baily, 1957) in the SIR model and the former z ('basic reproduction rate' by Bartlett, 1955), the R-number at time=0 in a growth model

AntoineSoetewey commented 3 years ago

Hello

Thank you for your comment. If I understand well:

R_e on time=0 is also called R_0.

Yes. See this article: "The basic reproduction number (R0) is the reproduction number when there is no immunity from past exposures or vaccination, nor any deliberate intervention in disease transmission. We refer to R as an effective reproduction number when there is some immunity or some intervention measures are in place."

This is the reason that:

R_e = R_0 * (S/N)

The effective reproduction number R_e is indeed the basic reproduction number times the fraction of the population that is susceptible (source).

So we might have to distinguish the former rho ('relative removal rate' by Baily, 1957) in the SIR model and the former z ('basic reproduction rate' by Bartlett, 1955), the R-number at time=0 in a growth model

rcsmit commented 3 years ago

Thanks again! Is there any way that we can link the SIR model as you describe with a model that describes the growth as I made in https://share.streamlit.io/rcsmit/covidcases/main/number_of_cases_interactive.py.

My goal is/was to enter the average number of days infectious (1/gamma) and the R0/Re@t=0

Now my immunization is calculated on the base of a decline R number - R= R *[1-(I/N)]

AntoineSoetewey commented 3 years ago

Thanks again! Is there any way that we can link the SIR model as you describe with a model that describes the growth as I made in https://share.streamlit.io/rcsmit/covidcases/main/number_of_cases_interactive.py.

My goal is/was to enter the average number of days infectious (1/gamma) and the R0/Re@t=0

Now my immunization is calculated on the base of a decline R number - R= R *[1-(I/N)]

I am not an expert in epidemiological models so I prefer to refrain from answering these questions which require more advanced knowledge. This is to avoid giving wrong or inaccurate information. I am sure other more knowledgeable people will be able to help on that matters.

Great app by the way!

rcsmit commented 3 years ago

Thanks for your compliments. I want to make it as perfect as I can, with all my limited knowledge and parameters :) Have a nice day!

AntoineSoetewey commented 3 years ago

Thanks for your compliments. I want to make it as perfect as I can, with all my limited knowledge and parameters :) Have a nice day!

It already looks quite complete. I'll check it out again in the near future in case you managed to implement other features!

Have a nice day too!