OHDSI / ResultModelManager

RMM is an R package designed to handle common ohdsi results data management functions by providing a common API for data model migrations and definitions
https://ohdsi.github.io/ResultModelManager/
Apache License 2.0
3 stars 3 forks source link

Generalize `uploadResults()` for use in other packages #13

Closed msuchard closed 1 year ago

msuchard commented 3 years ago

I'd like to propose that we generalize the signature for uploadResults() so that this function can be (re-)used in other packages. The proposal is to change this signature to:

uploadResults <- function(connectionDetails,  # NOTE: THIS CANNOT BE NULL (CURRENT DEFAULT)
                          schema,
                          zipFileName,
                          forceOverWriteOfSpecifications = FALSE,
                          purgeSiteDataBeforeUploading = TRUE,
                          tempFolder = tempdir(),
                          specification = getResultsDataModelSpecifications()) # CHANGE HERE

then other packages (like LegendT2dm could provide their own specification arguments.

msuchard commented 3 years ago

If you (@schuemie @gowthamrao) agree, I'll make the changes. Let me know.

msuchard commented 3 years ago

or move this function (and its dependencies) into, say, OhdsiRTools and then write:

CohortDiagnostics::uploadResults <- function(connectionDetails,  # NOTE: THIS CANNOT BE NULL (CURRENT DEFAULT)
                          schema,
                          zipFileName,
                          forceOverWriteOfSpecifications = FALSE,
                          purgeSiteDataBeforeUploading = TRUE,
                          tempFolder = tempdir()) {

  OhdsiRTools::uploadResults(connectionDetails, 
                          schema,
                          zipFileName,
                          forceOverWriteOfSpecifications,
                          purgeSiteDataBeforeUploading,
                          tempFolder,
                          specification = getResultsDataModelSpecifications()) 
}
gowthamrao commented 3 years ago

i love it! OhdsiRTools sounds like a great new home.

We can target cohort diagnostics 2.2 if you like

schuemie commented 3 years ago

I agree that our approach to uploading the results is becoming a repeating pattern that warrants extraction to a reusable function. One thing to keep in mind is that the function does need to know what the primary key field(s) of a table are. So we should make it a hard requirement that folks provide a results model specification like this one. The advantage is that these specs can be used both to create the table structure, and to generate the data model documentation.

I'm not sure about OhdsiRTools as its new home though. Right now OhdsiRTools is a package only for package developers, not package users. This is also the reason why OhdsiRTools is currently not in HADES. Maybe we could create a new OhdsiStudies package? (Or OhdsiStudyPackages?)

I propose to stick to Postgres for now. Uploading large amounts of data to existing tables is highly optimized in DatabaseConnector for Postgres , but not so much for other platforms (with the possible exception of RedShift when bulk uploading is configured).

schuemie commented 3 years ago

(An alternative destination might be OhdsiSharing)

leeevans commented 3 years ago

I agree that OhdsiSharing is a good home for this function and I'd be happy to start working on implementing it there.

leeevans commented 3 years ago

Should we also add an (OHDSI network) siteId parameter? I believe it could be useful in a couple ways:

a) Use it to create a new database schema for the site - if we have a use case for creating separate site schemas? b) Use it to populate a site id column in the generated results tables - downstream processes can use that column to merge the data across sites into a single combined table.

The OHDSI cloud admin can assign a unique OHDSI network site id along with providing the database userid/password credentials to the new site user responsible for uploading the results.

If we do want to assign a unique OHDSI network site id, we could perhaps use the same approach used for assigning java namespaces?

https://en.wikipedia.org/wiki/Reverse_domain_name_notation

msuchard commented 3 years ago

I'm certainly in favor of OhdsiSharing as a good home for uploadResults. My suggestion would be to first port over the existing functionality (to remove lots of code duplication in many places) and then add additional functionality while maintaining backwards compatibility.

uploadResults is currently agnostic to table columns (a table can have a siteId or not depending on its specification).

A function to create a new database schema would (and currently is, e.g. CohortDiagnostics::createResults...) separate from the uploading functionality.

gowthamrao commented 3 years ago

@leeevans can we transfer the issue from Cohort Diagonstics to OhdsiSharing?

Once its added to OhdsiSharing, i can remove it from Cohort Diagnostics

leeevans commented 3 years ago

@msuchard - just a quick status update - I've been working on encapsulating the results functionality (postgresql results schema tables creation from results model spec csv file & DDL SQL file and uploadResults logic) into the OhdsiSharing R package. I'm hoping to have a pull request for review by the end of the week.

msuchard commented 3 years ago

Thanks and great, @leeevans . Here are some functions that I've written along these lines as a stop-gap:

https://github.com/ohdsi-studies/LegendT2dm/blob/34afe5bc9606b0e2bee518505e51a46dd75f37ae/R/BackendDatabase.R#L47

https://github.com/ohdsi-studies/LegendT2dm/blob/34afe5bc9606b0e2bee518505e51a46dd75f37ae/R/BackendDatabase.R#L28

https://github.com/ohdsi-studies/LegendT2dm/blob/34afe5bc9606b0e2bee518505e51a46dd75f37ae/R/BackendDatabase.R#L90

https://github.com/ohdsi-studies/LegendT2dm/blob/34afe5bc9606b0e2bee518505e51a46dd75f37ae/R/BackendDatabase.R#L362

leeevans commented 3 years ago

Thanks for the link @msuchard - I've been working from the below set of functions in CohortDiagnostics which is quite similar. I'll work on incorporating the best of both in OhdsiSharing.

https://github.com/OHDSI/CohortDiagnostics/blob/master/R/ResultsDataModel.R

gowthamrao commented 2 years ago

Hi @msuchard @leeevans i was wondering if we had made progress on this effort? Reason i ask is -- i am looking for options to upload data to postgres for OHDSI phenotype library - that has a large results file.

Is it useful to remove indexes/constraints etc before uploading and then put back the indexes? Is that something we can think of as part of this effort? I have read things like setting off some logging (ALTER TABLE SET UNLOGGED) helps too?

leeevans commented 2 years ago

Hi @gowthamrao I made some progress on this. I can help you with the results file upload to postgres. Let's discuss on Skype on Wednesday morning.

anthonysena commented 2 years ago

Hi all - just wanted to note that I'm currently working on this functionality so I'll provide any updates/questions on this thread. Thanks!

anthonysena commented 1 year ago

In speaking with @azimov, I'd like to propose that we move this issue & work over to the OHDSI/ResultModelManager package.

If there are no objections, @leeevans can you transfer this issue to the other repository? From there I'll update my PR to work in the RMM package.

Tagging @schuemie for awareness and any feedback.

leeevans commented 1 year ago

@anthonysena the issue has been transferred

azimov commented 1 year ago

This has now been released in the latest version of the package