Make `likelihood()` work with `<epichains_summary>`, `<epichains>`, and `<numeric>` objects

jamesmbaazam commented 7 months ago

This PR closes #39.

Please check if the PR fulfills these requirements
[x] I have read the CONTRIBUTING guidelines
[ ] A new item has been added to NEWS.md
[x] Tests for the changes have been added (for bug fixes / features)
[x] Docs have been added / updated (for bug fixes / features)
[x] Checks have been run locally and pass
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

A feature
What is the current behavior? (You can also link to an open issue here)

likelihood() works only with a numeric vector of chains.

What is the new behavior (if this is a feature change)?

An <epichains_tree> or <epichains_summary> object can be passed directly to likelihood() to estimate the likelihood of observing the chains.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

Not applicable.

Other information:

~- Do we want to override the other relevant arguments (offspring_dist, stat_max, etc) with the attributes of an <epichains_tree> when passed, or should the user still supply them?~

Currently, if the data contains Inf and stat_max is Inf, it errors because R cannot generate an infinite sequence (See https://github.com/epiverse-trace/epichains/blob/5e6fd4966f1986eead1421f7298d50308768fa56/R/likelihood.R#L104).

@sbfnk, how can we surmount this issue?

Would it make sense to calculate the likelihood using only the finite values and in the case where individual = TRUE, return NA for the Inf values?

codecov-commenter commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.07%. Comparing base (a7faf3a) to head (8bb0e64). Report is 119 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #213 +/- ## ========================================== + Coverage 99.03% 99.07% +0.03% ========================================== Files 8 8 Lines 729 755 +26 ========================================== + Hits 722 748 +26 Misses 7 7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jamesmbaazam commented 7 months ago

@sbfnk Currently, if the data contains Inf (more likely when an <epichains_summary> is passed) and stat_max is Inf, it errors because R cannot generate an infinite sequence (See https://github.com/epiverse-trace/epichains/blob/5e6fd4966f1986eead1421f7298d50308768fa56/R/likelihood.R#L103-L104)

Would it make sense to calculate the likelihood using only the finite values and in the case where individual = TRUE, return NA where the entry is Inf?

sbfnk commented 7 months ago

@sbfnk Currently, if the data contains Inf (more likely when an <epichains_summary> is passed) and stat_max is Inf, it errors because R cannot generate an infinite sequence (See

https://github.com/epiverse-trace/epichains/blob/5e6fd4966f1986eead1421f7298d50308768fa56/R/likelihood.R#L103-L104

) Would it make sense to calculate the likelihood using only the finite values and in the case where individual = TRUE, return NA where the entry is Inf?

I don't think it would be correct to exclude some of the outbreaks from the likelihood.

We can only have data containing Inf with a finite stat_max in the simulation, right? So it doesn't really make a huge amount of sense to then calculate the likelihood with stat_max as Inf. Perhaps this should be added to the summary as attribute anyway? The likelihood function could then read it out and set stat_max to the "correct" value.

jamesmbaazam commented 7 months ago

@sbfnk Currently, if the data contains Inf (more likely when an <epichains_summary> is passed) and stat_max is Inf, it errors because R cannot generate an infinite sequence (See https://github.com/epiverse-trace/epichains/blob/5e6fd4966f1986eead1421f7298d50308768fa56/R/likelihood.R#L103-L104

) Would it make sense to calculate the likelihood using only the finite values and in the case where individual = TRUE, return NA where the entry is Inf?

I don't think it would be correct to exclude some of the outbreaks from the likelihood.

We can only have data containing Inf with a finite stat_max in the simulation, right? So it doesn't really make a huge amount of sense to then calculate the likelihood with stat_max as Inf.

Perhaps this should be added to the summary as attribute anyway?

This is already the case.

The likelihood function could then read it out and set stat_max to the "correct" value.

That makes sense. See change here 8a35bc7. Did you mean that?

epiverse-trace / epichains

Make `likelihood()` work with `<epichains_summary>`, `<epichains>`, and `<numeric>` objects #213

Codecov Report