Possibly: additional backends

zsusswein commented 4 months ago

Currently we support gam and bam. In my mind these are the big two. Potential additional options are mgcv::ginla() or brms::brm(), but I don't have a clear vision for these yet.

EDIT: Also {mvgam}

The questions to resolve here are:

Who would the additional backend be for? What's the use-case? What workflows would we try to support?
- Current best answer is to allow expanded functionality with more complex models. These aren't required for barebones initial modeling, but opportunities to expand once UKHSA's main functionality is in place.
How would they mechanically be implemented? The lift would be pretty light for an additional mgcv function but brms would likely take a bit more rejiggering. Is it enough to switch to an internal S3 class structure? Do we need to think more about a public model fitting function? I'm resistant to it, but would listen if we could open up some important missing functionality. Obviously the answer to (2) is downstream of (1).
- Still not sure on this one. Had a nice talk with Dan McDonald at the $R_t$ Collabathon where he talked through the tidymodels workflow for formula creation. I'm still not totally sure how to implement that but thought it was interesting and potentially we could do it there?

[ ] #34
[ ] #35
[ ] Support for mgcv::ginla()

seabbs commented 4 months ago

In a world where you wanted to generalise this to more complex models then I think brms support would be good but if we are happy shutting the door on that (without major reworking) i think its fine it make a statement that this is a mgcv extension package and work from there

zsusswein commented 4 months ago

I'm definitely not opposed in principle, but I think we'd need a clear vision of the scenarios we want to enable and what it means for functionality vis a vis EpiNow2/Epinowcast. If we go full bayes, why not implement it as a forward renewal model? If we do, how is this different than a re-implementation of EpiNow2? If we don't, we aren't we implementing the current state of the art?

Basically, I think we should have a clear idea of what we're doing here and when we want to hand things off to the Epinowcast/EpiNow2 universe. I think there could be a compelling case for joint delay estimation and forward convolution, which will be a major weakness here. I don't know where to draw the line though, and that makes me uneasy.

zsusswein commented 4 months ago

An additional thought here: it may be worth exploring gamm and/or gamm4. I imagine the many thousands of timeseries case will be challenging and computationally intensive and we could squeeze out some better performance with the lme4 backend vs the standard nlme.

I think I have a clearer vision for those use-cases (fast approximate models implemented with a focus on real-time performance) than I do for the brms case right now?

seabbs commented 4 months ago

If we go full bayes, why not implement it as a forward renewal model? If we do, how is this different than a re-implementation of EpiNow2? If we don't, we aren't we implementing the current state of the art?

The difference would be in handling delays I think which is tricky but you still could. What you would not really be able to include would be all the uncertainty handling stuff so in that sense that would be the unique place EpiNow2 would occupy.

To me the line is everything up to and including the renewal process and leave everything beyond that to other tools at least for now.

I think the main motivation for brms right now is:

Latent models that are not spline driven or at least not only spline driven
Ability to generate stan code from a consistent tool you understand the output of and adapt for your own use case

The longer term use case is ability to extend to include more complexity or to allow the user to do so (as we do in epidist)

All that being said I don't really think you should try and support brms in any currently planned versions. Rather I think you should make design decisions that don't constrain you to just having mgcv as a backend (which I think the current implementation does if you want the code to be at all clean). In some sense its not a problem as you could update with breaking changes but if wanting to avoid that then worth thinking about this all now

zsusswein commented 4 months ago

Dropping this here for more thought on gamm/gamm4

kgostic commented 4 months ago

Chiming in:

I think the main goal here is to have something we could possibly pull off the shelf and/or share with partners, ideally before the end of the calendar year. I think it's important to focus on one thing that works, rather than generality and flexibility right now.

With that in mind...

Who would the additional backend be for? What's the use-case? What workflows would we try to support?

I think the main use case should be to support alternative backends if they help maintain stability and reasonable speed across many signals, including for large counts. I think that major extensions/generalizations of the methods (e.g. moving away from spline-based methods) is out of scope here.
Following on that, I could see trying to support gamm4() for the lme4 backend and maybe mgcv::ginla(), as these are backends that might help with speed or convergence without much of a lift. From the `gamm' documentation, it's not clear to me that the lme backend would really help with speed or stability unless we stick entirely to identity link functions.

How would they mechanically be implemented? The lift would be pretty light for an additional mgcv function but brms would likely take a bit more rejiggering. Is it enough to switch to an internal S3 class structure? Do we need to think more about a public model fitting function? I'm resistant to it, but would listen if we could open up some important missing functionality. Obviously the answer to (2) is downstream of (1).

I think we should skip brms. Do you still need to talk through implementation for the other two? If it's going to be a big lift there, I could even see sticking with gam and bam and calling it a day.

seabbs commented 4 months ago

I think we should skip brms. Do you still need to talk through implementation for the other two?

To focus this issue I don't think it's about which backends to eventually include. I think it's about should the package be designed now to handle backends from different packages (with a shared interface) or just limit to mgcv.

For the reasons already given I think it's compelling to design now for multiple backends but not actually implement any of them any time soon

seabbs commented 4 months ago

including for large counts.

I don't think the size of counts interacts with the current proposed model methodology so any backend should scale well here.****

seabbs commented 4 months ago

convergence

I'm not sure they will help with this but they will offer other modelling options and maybe more stable than mgcv in some circumstances (noting stable != Converged or reliable imo)?

seabbs commented 4 months ago

convergence

I'm not sure they will help with this but they will offer other modelling options and maybe more stable than mgcv in some circumstances (noting stable != Converged or reliable imo).

seabbs commented 4 months ago

. I think it's important to focus on one thing that works, rather than generality and flexibility right now.

I think this is the conversation to have. If you want at any point to think about other package backends I think will save a lot of pain to do the tooling to allow that seamlessly now (otherwise many breaking changes . In most instances we are taking about things like allowing users to pass ... args or overwrite hard coded internal settings.

If you don't want to support beyond mgcv, which I think is a totally reasonable place to set the scope then as is will work quite well (with a little more user options imo). The only issue is changing from this position has the most negative impact of available choices once a lot of work has been done

seabbs commented 4 months ago

For the question of which backends maybe they need their own issues and potential schedule. I do think there are very compelling reasons to think about brms here but also see the counter argument about scope creep.

seabbs commented 2 months ago

Seee #34 and #35 for specific backend issues.

CDCgov / cfa-gam-rt

Possibly: additional backends #21