epiverse-trace / epichains

Methods for simulating and analysing the sizes and lengths of infectious disease transmission chains from branching process models
https://epiverse-trace.github.io/epichains/
Other
6 stars 2 forks source link

Make `likelihood()` return a list of all options #207

Open jamesmbaazam opened 8 months ago

jamesmbaazam commented 8 months ago

likelihood() has an EXTREMELY hard to read/understand return value, i.e.,

If log = TRUE

A joint log-likelihood (sum of individual log-likelihoods), if individual == FALSE (default) and obs_prob == 1 (default), or

A list of individual log-likelihoods, if individual == TRUE and obs_prob == 1 (default), or

A list of individual log-likelihoods (same length as nsim_obs), if individual == TRUE and 0 <= obs_prob < 1, or

A vector of joint log-likelihoods (same length as nsim_obs), if individual == FALSE and 0 <= obs_prob < 1 (imperfect observation).

If log = FALSE, the same structure of outputs as above are returned, except that likelihoods, instead of log-likelihoods, are calculated in all cases. Moreover, the joint likelihoods are the product, instead of the sum, of the individual likelihoods.

I've thought about this and think we could return most of these in a named list and explain what each name means in the @return value section.

Benefits:

This is linked to #133.

@sbfnk @Bisaloo @chartgerink @pratikunterwegs Thoughts?

Bisaloo commented 8 months ago

Thanks for sharing!

I'm still struggling a little bit to wrap my head around the different outputs and the proposal here. A very helpful first step may be to provide examples that show all output types in likelihood() documentation.

If I understand correctly the proposal here, it seems a bit redundant to have both the log and non log likelihood in the same object, especially when it's so fast to go from one to the other. Same for individual vs summed likelihoods.

I would probably suggest to be a bit more opinionated and indeed remove the individual and log argument, but stick to one set of values for these (e.g., always log = TRUE & individual = TRUE). This way, you get the benefits you mention here and keep a simpler and relatively stable output (just changing based on obs_prob == 1).

sbfnk commented 8 months ago

This also relates to previous comments in #42 and #64. I'm not completely clear how much complexity this would add but I think that implementing S3 methods for predict and logLik would be potentially the best option. If I understand the logLik class correctly then it's a vector with an attribute df (the number of parameters estimated) which could be used here in line with @Bisaloo's comments.