CDCgov / cfa-gam-rt

R package for real-time Rt estimation with penalized splines
https://cdcgov.github.io/cfa-gam-rt/
Apache License 2.0
8 stars 0 forks source link

#8 [3/4]: Implement model fitting with mgcv #20

Open zsusswein opened 2 weeks ago

zsusswein commented 2 weeks ago

[!IMPORTANT] This PR a trial for using a stacked PR workflow. Do not merge.

Extends #18 to fit a GAM using the supplied formula and data. There are two implemented backends, mgcv::gam() and mgcv::bam(). Dispatch to the appropriate backend is handled via an internal S3 class -- the methods are exported but the generic is private so the methods are still internal to the package. Decent default arguments are supplied, with optional user-supplied arguments passed via a list.

There's also a new warning for legal but unwise parameterization.

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (5ee9a2c) to head (0283df9).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #20 +/- ## ========================================== Coverage 100.00% 100.00% ========================================== Files 3 6 +3 Lines 87 268 +181 ========================================== + Hits 87 268 +181 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

zsusswein commented 2 weeks ago

Having everything based on main makes it quite a pain. Are you just doing this to get the CI? Why not just update the CI to trigger on PRs against branches that aren't main.

Yep this is just for CI. I'll switch the base before opening the PR. I think before updating CI rules we want to check in on whether the whole stacked PR thing is working.

My only concern here is about how opinionated this is getting - i.e. very. We might want that and in that case its fine but I do think we should be taking choices here that still allow people to use it for other things even if the default behaviour is very opinionated.

Agreed -- I think the push and pull here is really helpful. I don't want to be overly rigid with my view of the target user and target use-case, but also I want to avoid stepping into the territory of EpiNow2/Epinowcast/EpiAware. Those are closer to custom model foundries than I think this package should be. That doesn't we should prevent any/all customization (and the current implementation is likely too constraining), but I do want to hear a convincing use-case for e.g. custom model formula and arguments. What would the user flow be? Who is this user? Does this package provide enough benefit over handrolling in mgcv or jumping right to EpiNow2 for us to optimize for their use-case?

My other concern is there are a few notes about implementing other backends etc but what is coded here doesn't really seem set up for success there (unless success is lots of internal character string based switches which I assume it isn't). Is there an issue scoping out how extensions would be implemented?

The current flow is here is that RtGam() fits the model and is the class constructor. I'd strongly prefer we avoid the public data constructor -> public model constructor flow, but perhaps an internal S3 class that switches would be possible? Issue to discuss is https://github.com/CDCgov/cfa-gam-rt/issues/21

seabbs commented 2 weeks ago

I think before updating CI rules we want to check in on whether the whole stacked PR thing is working.

Agree but from a review perspective the current setup of having to wade through all the old commits is very bad which makes it hard to know if stacked PRs are working

Those are closer to custom model foundries than I think this package should be. That doesn't we should prevent any/all customization (and the current implementation is likely too constraining), but I do want to hear a convincing use-case for e.g. custom model formula and arguments. What would the user flow be? Who is this user? Does this package provide enough benefit over handrolling in mgcv or jumping right to EpiNow2 for us to optimize for their use-case?

Yes I agree. I think the main user is NNH right? The question is do we want to ever on the fly do things like use a Poisson observation model because something isn't working or try some settings tweaks in bam because again something is a problem.

If you have to leave the package to make any of these tiny tweaks you are using all the benefits of the good defaults and the nice pre and post-processing

constructor flow, but perhaps an internal S3 class that switches would be possible? Issue to discuss is https://github.com/CDCgov/cfa-gam-rt/issues/21

zsusswein commented 2 weeks ago

Sure, but this PR isn't open for review yet as I note at the top?

seabbs commented 2 weeks ago

Sure, but this PR isn't open for review yet as I note at the top?

Fair enough. @seabbs out for now then!

zsusswein commented 1 week ago

... vs a list + does this allow actual default args to fit_model vs an internal list?

Addressing here

Having the class check be in fit_model.default vs when preparing the class. Doing so allows for user extension at the cost of control and potentially some opaque errors (though not at the moment).

I think the counterpoint to consider is what to do if/when we need to move to S3 for other steps (I'm thinking predict and maybe formula would be needed). Centralizing the check for backends into one place adds clarity I'd think, at the cost of extensibility as you noted.

seabbs commented 1 week ago

(I'm thinking predict and maybe formula would be needed)

If you did do this and backends didn't have methods these would by default throw an informative error (but not a incredibly pretty one) if you just had no default method