Sync Strategus modules and lock file

mdlavallee92 commented 5 months ago

Add function to look up latest strategus modules and instantiate them
Add function that pulls latest HADES lock file (if function doesn't already exist)

anthonysena commented 5 months ago

@mdlavallee92 let me know if we can connect on the work you are doing here as I think there are some resources that we can share to help in this effort.

mdlavallee92 commented 5 months ago

Here is how I am looking to use strategus within Ulysses...

Ulysses initializes the repo using a standard directory structure
Ulysses sets INSTANTIATED_MODULES_FOLDER as sys var and checks if strategus modules have been loaded or download latest
template a suggested strategus pipeline (minimal CohortGenerator, CohortDiagnostics) that I can reuse from project to project. As mentioned in #16
pull Hades wide lock file for directory to capture package dependencies
run series of tasks (strategus + non-modularized analytics) across my db assets

these are some of the functions I have in mind for this issue so far. Both of which I have in develop branch.

getHadesWideLock(version = "2023Q3") # place latest renv lock into project repo

# identify modules folder store.
setStrategusInstantiatedFolder(folderName = "Strategus_Modules")  # if folder doesnt exist create it and set environment var

Additionally, I was hoping to add a function that essentially runs Strategus:::instantiateModule up-front, separate from the execution run. The idea being I want to maintain my instantiated modules in my folder that I reference regardless of the study being run. So process like:

A) No Strategus installed locally Make Strategus Folder -> instantiate latest modules

B) Strategus previously installed Check module versions in folder -> download latest modules

Make sense?

anthonysena commented 5 months ago

Thanks for writing this up @mdlavallee92! I had a slightly different idea in mind but I think our thinking overlaps in certain places. Let me build upon what you put together:

Ulysses initializes the repo using a standard directory structure

Ulysses sets INSTANTIATED_MODULES_FOLDER as sys var and checks if strategus modules have been loaded or download latest

template a suggested strategus pipeline (minimal CohortGenerator, CohortDiagnostics) that I can reuse from project to project. As mentioned in https://github.com/OHDSI/Ulysses/issues/16

This is where I think we can inject a pre-made script of sorts based on previous studies such as createStrategusAnalysisSpecification.R.

I'll also note that Strategus has a sample analysis specification that we can make use of https://github.com/OHDSI/Strategus/blob/main/inst/testdata/analysisSpecification.json. More on this below.

pull Hades wide lock file for directory to capture package dependencies

I'm not aware of such a function but agree that this would be useful and aligns to some of the discussion from https://github.com/OHDSI/Strategus/pull/114.

run series of tasks (strategus + non-modularized analytics) across my db assets

So far, so good. Then you mentioned:

Additionally, I was hoping to add a function that essentially runs Strategus:::instantiateModule up-front, separate from the execution run. The idea being I want to maintain my instantiated modules in my folder that I reference regardless of the study being run.

Agreed! So to highlight a few things that might be helpful, I've started working on a Docker container for Strategus and I also wanted to make sure the latest modules are installed. To do this, I'm using the analysis specification that I have embedded into Strategus and running code like:

sampleAnalysisSpecifications <- system.file("testdata/analysisSpecification.json", package = "Strategus")
analysisSpecifications <- ParallelLogger::loadSettingsFromJson(
        fileName = sampleAnalysisSpecifications
)
Strategus::ensureAllModulesInstantiated(analysisSpecifications)

Also worth mentioning that Strategus has a function to list the latest modules and the corresponding HADES package:

Strategus::getModuleList()
#> # A tibble: 8 × 7
#>   module version remoteRepo remoteUsername moduleType mainPackage mainPackageTag
#>   <chr>  <chr>   <chr>      <chr>          <chr>      <chr>       <chr>         
#> 1 Chara… v0.5.0  github.com OHDSI          cdm        Characteri… v0.1.3        
#> 2 Cohor… v0.2.0  github.com OHDSI          cdm        CohortDiag… v3.2.5        
#> 3 Cohor… v0.3.0  github.com OHDSI          cdm        CohortGene… v0.8.1        
#> 4 Cohor… v0.4.0  github.com OHDSI          cdm        CohortInci… v3.3.0        
#> 5 Cohor… v0.3.0  github.com OHDSI          cdm        CohortMeth… v5.2.0        
#> 6 Patie… v0.3.0  github.com OHDSI          cdm        PatientLev… v6.3.6        
#> 7 SelfC… v0.4.1  github.com OHDSI          cdm        SelfContro… v5.1.1        
#> 8 Evide… v0.6.0  github.com OHDSI          results    EvidenceSy… v0.5.0

^{Created on 2024-02-07 with reprex v2.0.2}

A few other points we'll need to address with the current (v0.x) Strategus approach:

Keyring setup: we're aiming to get rid of the keyring dependency but for now we might want to include some mechanism for setting up & confirming the keyring settings
Results Viewer: we currently have some helper scripts to upload results and to view them using the OHDSI Shiny viewer. We might consider including this in the resulting R Project produced by Ulysses.

mdlavallee92 commented 5 months ago

Awesome @anthonysena! Think we are generally converging between these two packages. Any overlapping stuff I think we can sort out and deprecate out of Ulysses as we get further along.

The general idea is Ulysses is like your "workbench" organizer. I can set up strategus if I want, I can set up internal scripts if I want. But at the end of the day my study repo needs to be organized and easy to pick up and follow. No matter if it's an internal study or something we send around via ohdsi-studies or ehden. There are so many pieces to executing an ohdsi pipeline I feel like organization is key. Ability to reliably pull my atlas assets and keep track of my study cohorts, write up and maintain my statistical analysis plan, and boot up strategus to execute if I need it.

Yes, I like what you are depicting in the sampleAnalysisSpecifications snippet. This way I am always working with my instantiated modules. This would also avoid some of these source calls to github raw, which can get out of date pretty quick the more these modules evolve. I don't see the point in wrapping existing methods in Strategus as Ulysses functions so perhaps we just provide a vignette on how to boot Strategus within Ulysses.

For keyring, I have a startup script in Ulysses to set these up, much of which I liberally took from the SOS challenge 😝. I agree, keyring gets confusing and its not worth the trouble debugging in a network study from experience. Feel that if a user wants keyring fine, but they should set it up. Too much of a hassle to enforce that from an open-source standpoint using Ulysses or Strategus. For credentials I like to use config.yml files that store all credentials and then cycle through this file to run the same study across database assets. Ulysses has functions to set this config.yml file in your project repo and import credentials from a master credentials.csv file you store locally.

As for the results viewer, yes I agree exposing these templates via Ulysses. I made an issue #18

I will connect with you more offline

anthonysena commented 5 months ago

The general idea is Ulysses is like your "workbench" organizer.

I guess I was thinking that Ulysses would be used by someone who wants to set up an R Project to run Strategus and it would provide them with the necessary resources (R scripts, etc) to get them started. Do I have this right? I think setting up the environment to work within this R project is necessary and useful but wanted to ask this specifically.

OHDSI / Ulysses

Sync Strategus modules and lock file #17