Centralizing modules & renv lock file

anthonysena commented 1 year ago

From discussion with @schuemie, we'd like to update Strategus to have it be the central location for individual analytic modules while also allowing for the flexibility of using external modules as is currently supported. The goal here is to align Strategus's modules & renv.lock file with the planned regular release(s) of HADES. To do this, we discussed some ideas which I'll put forward in this issue.

Strategus would maintain a central lock file that is used by all of the modules. When a study is executed via Strategus, the renv.lock file would be included with the analysis specification and execution settings JSON to provide a complete record of the study. Specifically, the renv.lock file would detail all of the R dependencies (including Strategus), the analysis specification would detail the design choices for the study and the execution settings would detail the specific environment settings for executing the study.
Modules would reside inside of the Strategus package, in the R code folder. We could standardize the names of the module .R files to "Module.R". Additionally, it may be useful to use R6 classes for the modules to provide an interface for the modules and an inheritance structure we can use to centralize specific utility functions that modules may need (i.e. how to get the cohort definitions from the analysis specification).
The analysis specification itself would need an update, presumably we'd remove the version, remoteRepo and remoteUsername attributes in the module specification (shown below). In the case when these attributes are missing, Strategus would source the module from its local set of modules otherwise it would instantiate the module as is done today. https://github.com/OHDSI/Strategus/blob/c701540320f9145e166e16a50967d45a879c4a8c/inst/testdata/analysisSpecification.json#L226-L228

Tagging @ablack3 for awareness.

ablack3 commented 1 year ago

Thanks for sharing this plan @anthonysena. Overall it makes sense.

while also allowing for the flexibility of using external modules

The one requirement I’d like to add (or highlight if it already exists) is that Strategus can be extended with new modules without ever requiring a new OHDSI repo or pull request on an existing OHDSI repo. This would give me and others the ability to experiment with extensions without needing permission/access rights. I think this is the idea behind the open-closed principle.

My other comment is that I might steer away from using R6 for modules and instead suggest defining generic functions that modules must implement (essentially the module interface). These generic functions could be defined in the Strategus package using S3 or S4 (or even the new R7) and then implemented in other packages that extend Strategus functionality. I have not used R6 much but what I think R6 is good for is maintaining and mutating state in an application as well as providing an alternative to R’s normal “copy on modify” behavior. I think the functional OOP approach implemented in S3, S4, and R7 would work well in this case and put the focus on the generic functions a module (which is likely to be somewhat synonymous with an analytic R package) needs to implement to work with Strategus. That’s my two cents though and you should go with what makes the most sense to you and the other developers working on Strategus.

anthonysena commented 1 year ago

The one requirement I’d like to add (or highlight if it already exists) is that Strategus can be extended with new modules without ever requiring a new OHDSI repo or pull request on an existing OHDSI repo. This would give me and others the ability to experiment with extensions without needing permission/access rights. I think this is the idea behind the open-closed principle.

Thanks @ablack3 for noting this. I'd like to point out this is currently possible given the design of Strategus. To illustrate this point a bit more, I'll use an example. In Strategus, I'm putting together a sample analysisSpecification.json document that we can use to test Strategus and run various modules. Here is an excerpt from that document:

https://github.com/OHDSI/Strategus/blob/01e63f7b160801c012a48dedbc432f1c55a9cfc6/inst/testdata/analysisSpecification.json#L225-L228

As shown above, each module currently has a reference to the location of the module itself. So, if I wanted to use a module outside of OHDSI, I could reference it and make use of it in the same specification. To showcase how that is currently done, here is an excerpt from the vignette:

https://github.com/OHDSI/Strategus/blob/01e63f7b160801c012a48dedbc432f1c55a9cfc6/vignettes/CreatingAnalysisSpecification.Rmd#L76-L97

The block of code referenced above is used to create the CohortGeneratorModule settings that go into the analysisSpecification.json. As shown here, the elements of the analysisSpecification.json are taken from the function cohortGeneratorModuleSpecifications. Digging into this even deeper, if we take a look at cohortGeneratorModuleSpecifications you will see that the metadata is part of that function. Also worth noting that this function is auto-generated using code in ModuleMaintenance.R

My other comment is that I might steer away from using R6 for modules and instead suggest defining generic functions that modules must implement (essentially the module interface). These generic functions could be defined in the Strategus package using S3 or S4 (or even the new R7) and then implemented in other packages that extend Strategus functionality. I have not used R6 much but what I think R6 is good for is maintaining and mutating state in an application as well as providing an alternative to R’s normal “copy on modify” behavior. I think the functional OOP approach implemented in S3, S4, and R7 would work well in this case and put the focus on the generic functions a module (which is likely to be somewhat synonymous with an analytic R package) needs to implement to work with Strategus. That’s my two cents though and you should go with what makes the most sense to you and the other developers working on Strategus.

This would be worthwhile to discuss - I'm currently thinking that it may be desirable to create a StrategusModules package that contains the individual modules and supporting functions/classes to keep the main Strategus repo focused on executing the analyses. Happy to hear your thoughts on S3/S4/R6/R7 as I haven't done a lot with these OOP bits in R. Tagging @azimov @chrisknoll @levalle @mdlavallee92 as they may have some experience to share on this topic.

azimov commented 1 year ago

Modules would reside inside of the Strategus package, in the R code folder. We could standardize the names of the module .R files to "Module.R". Additionally, it may be useful to use R6 classes for the modules to provide an interface for the modules and an inheritance structure we can use to centralize specific utility functions that modules may need (i.e. how to get the cohort definitions from the analysis specification).

I don't think that a centralized module location would be particularly useful or a good idea. On a user's system it should be possible to install multiple versions of the same module, instantiate them and use them in whatever executions they need to.

For a given study it may also be desirable to call a module that uses only proprietary code that will never see the light of day outside the organisation.

I think a much more desirable solution would be to allow users to install whatever modules they want into a local system path. This could contain many different modules of different versions in both a global location that the user can backup or delete at will. The module's in these folders would be able to list their source location (e.g. a github repository) but requiring a github doesn't aid reproducibility in my view (it's just a convenient way of sharing code).

This will also simplify development because module developers would be able to build and test in local environments. It's already quite painful to develop modules because they require a github release. Linking them all to a single github repository would make that even harder to push and test small changes to.

A centralized repository will also become extremely difficult to maintain over time. We'll basically be implementing something like CRAN and I don't really like the idea of any one organisation maintaining this because I fear it will hamper adoption amongst the community. Study packages had problems but they also have the really appealing principle that anyone could just develop them and share them.

If user's can't do that then why wouldn't they just maintain a single renv.lock file and say "use this" with a script.R file they maintain?

The one requirement I’d like to add (or highlight if it already exists) is that Strategus can be extended with new modules without ever requiring a new OHDSI repo or pull request on an existing OHDSI repo. This would give me and others the ability to experiment with extensions without needing permission/access rights. I think this is the idea behind the open-closed principle.

I agree with this. Modules shouldn't need to be OHDSI owned (or even run on an OMOP CDM, for example). I think this package should limit its' scope to allowing users to orchestrate complex workflows in modules across different renv environments.

I think a simplification would be to model how npm/yarn work in the javascript world. You want to download the packages into a local bundle (or a global space on disk) and then use them in a study.

anthonysena commented 3 months ago

Keeping this open based on discussions with @egillax since we'd like to migrate the remaining modules into Strategus. Specifically:

anthonysena commented 2 months ago

Closing and tracking via #164.

OHDSI / Strategus

Centralizing modules & renv lock file #29