epiverse-trace / episoap

[Not published - under active development] A Store of Outbreak Analytics Pipelines Provided as Rmarkdown Report Templates
https://epiverse-trace.github.io/episoap/
Other
4 stars 2 forks source link

Creating epidist objects allowing flexibility for distribution parameters #112

Closed CarmenTamayo closed 4 months ago

CarmenTamayo commented 6 months ago

On PR #100 the usage of epiparameter is updated in the transmissibility report. As a way to solve issue #109 , an epidist object is created so that epiparameter::discretise can be used also when the user is manually providing parameters and distribution of the SI.

To my knowledge, when creating an epidist, it's necessary to specify the names of the parameters of the distribution, ie if it's gamma, it's necessary to type prob_distribution_params = c(shape = x, scale = x).

However the template requires these parameters to be flexible, as the distribution can be provided by the user as well... I've been trying out different ways to do this but can't figure it out, I'm looking for a way to, instead of specifying shape/scale or meanlog/sdlog, do something like prob_distribution_parameters = c(param1, param2), that can be applied to all different distributions.

@joshwlambert is there an easy way to do this with epiparameter? or would we have to write a function for this?

Thank you!

joshwlambert commented 6 months ago

I will have to give this a bit of thought but my initial guess is this cannot be handled by {epiparameter}. The package handles variable input parameters for each distribution and can standardise these (e.g. a user can input shape and rate for a gamma and {epiparameter} can convert to shape and scale). So if the user (in this case the {episoap} user) doesn't provide parameter names it does not know if they are shape and rate or shape and scale (in the case of the gamma distribution).

You could document that the parameters input by the user need to be of a certain parameterisation, but this would be needed for each distribution accepted by {episoap} which is far from ideal and not something I would advise.

Happy to take a look into this next week and try and find a nice solution.

CarmenTamayo commented 6 months ago

Thanks Josh, I agree that that wouldn't be a good solution, let me know when you have had a thought and I'll update the script

Bisaloo commented 6 months ago

At the moment, if users provide their own distributions, they are expected to pass family, mean and sd, which epiparameter should be able to handle fine.

Do you have a specific example of code that you anticipate being problematic?

CarmenTamayo commented 6 months ago

The issue is at the point where epidist objects are created using epiparameter. When using this function, it's necessary to specify not only the distribution (which is included in the parameter list with params$si_dist) but also the specific name of the distribution parameters, e.g., shape and scale or meanlog and sdlog, which means that this function cannot be generalised for any distribution provided by the user at the moment, e.g., in the example I provided in the issue I raised initially: epidist( disease = params$disease_name, epi_dist = "serial_interval", prob_distribution = params$si_dist, prob_distribution_params = c(shape = si_params[1], scale = si_params[2]) <- here if the distribution was not a gamma when using params$si_dist but a lnorm, there is not a way to account for this and an error is generated

As suggested, something like prob_distribution_params = c(param1 = si_params[1], param2 = si_params[2]) is what I was looking for, but this is not possible with epiparameter, as Josh commented

CarmenTamayo commented 6 months ago

This also happens for instance in the EpiNow2 chunk, when converting distribution parameters to summary statistics , with epiparameter this would be: convert_params_to_summary_stats(distribution = "gamma", shape = si[1] , scale = si[2] ), where distribution = params$si_dist could be used, but the shape and scale cannot be changed to a generic form, e.g., param1 = si[1] , param2 = si[2] bc the package requires these to be named with the specific name for each distribution

Bisaloo commented 6 months ago

The issue is at the point where epidist objects are created using epiparameter. When using this function, it's necessary to specify not only the distribution (which is included in the parameter list with params$si_dist) but also the specific name of the distribution parameters, e.g., shape and scale or meanlog and sdlog, which means that this function cannot be generalised for any distribution provided by the user at the moment, e.g., in the example I provided in the issue I raised initially: epidist( disease = params$disease_name, epi_dist = "serial_interval", prob_distribution = params$si_dist, prob_distribution_params = c(shape = si_params[1], scale = si_params[2]) <- here if the distribution was not a gamma when using params$si_dist but a lnorm, there is not a way to account for this and an error is generated

As suggested, something like prob_distribution_params = c(param1 = si_params[1], param2 = si_params[2]) is what I was looking for, but this is not possible with epiparameter, as Josh commented

The solution for this should be to not convert the parameters beforehand but pass directly the mean and sd to epidist() and let epiparameter deal with conversions internally if necessary.

This means we know that we always have sd and mean and can pass them as names.

CarmenTamayo commented 6 months ago

Thanks Hugo, that's great, I'm now using this instead:

si_epidist <- epidist( disease = params$disease_name, epi_dist = "serial_interval", prob_distribution = params$si_dist, summary_stats = create_epidist_summary_stats(mean = params$si_mean,sd=params$si_sd), auto_calc_params = TRUE)

In the case of converting parameters to summary statistics, like I mentioned here:

This also happens for instance in the EpiNow2 chunk, when converting distribution parameters to summary statistics , with epiparameter this would be: convert_params_to_summary_stats(distribution = "gamma", shape = si[1] , scale = si[2] ), where distribution = params$si_dist could be used, but the shape and scale cannot be changed to a generic form, e.g., param1 = si[1] , param2 = si[2] bc the package requires these to be named with the specific name for each distribution

What do you think would be the best solution?

Bisaloo commented 6 months ago

For now, you can use do.call(), which allows to pass arbitrary named value to a function:

library(epiparameter)

# This returns a lnorm dist
edist <- epidist_db(
  disease = "COVID-19",
  epi_dist = "serial interval",
  author = "Nishiura",
  single_epidist = TRUE
)
#> Using Nishiura H, Linton N, Akhmetzhanov A (2020). "Serial interval of novel
#> coronavirus (COVID-19) infections." _International Journal of
#> Infectious Diseases_. doi:10.1016/j.ijid.2020.02.060
#> <https://doi.org/10.1016/j.ijid.2020.02.060>.. 
#> To retrieve the short citation use the 'get_citation' function
family(edist)
#> [1] "lnorm"

do.call(
  convert_params_to_summary_stats,
  c(
    distribution = family(edist),
    as.list(get_parameters(edist))
  )
)
#> $mean
#> [1] 4.7
#> 
#> $median
#> [1] 3.999869
#> 
#> $mode
#> [1] 2.896954
#> 
#> $var
#> [1] 8.41
#> 
#> $sd
#> [1] 2.9
#> 
#> $cv
#> [1] 0.6170213
#> 
#> $skewness
#> [1] 2.085973
#> 
#> $ex_kurtosis
#> [1] 8.617709

# This returns a gamma dist
edist <- epidist_db(
  disease = "ebola",
  epi_dist = "serial interval",
  single_epidist = TRUE
)
#> Using WHO Ebola Response Team, Agua-Agum J, Ariyarajah A, Aylward B, Blake I,
#> Brennan R, Cori A, Donnelly C, Dorigatti I, Dye C, Eckmanns T, Ferguson
#> N, Formenty P, Fraser C, Garcia E, Garske T, Hinsley W, Holmes D,
#> Hugonnet S, Iyengar S, Jombart T, Krishnan R, Meijers S, Mills H,
#> Mohamed Y, Nedjati-Gilani G, Newton E, Nouvellet P, Pelletier L,
#> Perkins D, Riley S, Sagrado M, Schnitzler J, Schumacher D, Shah A, Van
#> Kerkhove M, Varsaneux O, Kannangarage N (2015). "West African Ebola
#> Epidemic after One Year — Slowing but Not Yet under Control." _The New
#> England Journal of Medicine_. doi:10.1056/NEJMc1414992
#> <https://doi.org/10.1056/NEJMc1414992>.. 
#> To retrieve the short citation use the 'get_citation' function
family(edist)
#> [1] "gamma"

do.call(
  convert_params_to_summary_stats,
  c(
    distribution = family(edist),
    as.list(get_parameters(edist))
  )
)
#> $mean
#> [1] 14.2
#> 
#> $median
#> [1] 0.2873807
#> 
#> $mode
#> [1] 7.709859
#> 
#> $var
#> [1] 92.16
#> 
#> $sd
#> [1] 9.6
#> 
#> $cv
#> [1] 0.6760563
#> 
#> $skewness
#> [1] 1.352113
#> 
#> $ex_kurtosis
#> [1] 2.742313

Created on 2023-12-11 with reprex v2.0.2

CarmenTamayo commented 6 months ago

Thanks Hugo, I'll add it to the template for now

joshwlambert commented 6 months ago

Thanks @Bisaloo for the solutions, I agree with your suggestions to use epidist() to do the heavily lifting where possible and use do.call() for arbitrarily named vectors.

@CarmenTamayo can this issue be closed?

CarmenTamayo commented 6 months ago

Thank you both, I'm going to leave the issue open until I make all the changes and make sure the do.call() function works

sbfnk commented 6 months ago

The solution for this should be to not convert the parameters beforehand but pass directly the mean and sd to epidist() and let epiparameter deal with conversions internally if necessary.

It might not matter in this case but just to mention that this will only work as long as the parameters don't have uncertainty (in which case the conversion is no longer straightforward).

CarmenTamayo commented 5 months ago

Waiting to close #113 to solve this issue