OHDSI / CohortDiagnostics

An R package for performing various cohort diagnostics.
https://ohdsi.github.io/CohortDiagnostics
41 stars 48 forks source link

Enable CohortDiagnostics to run on other DBMSs in addition to Postgres #317

Closed leeevans closed 2 years ago

leeevans commented 3 years ago

Cohort Diagnostics is a great tool. It would be very useful to be able to run it on other DBMSs in addition to Postgres. If some advanced Cohort Diagnostics functionality depends on Postgres DBMS specific features, then those features could perhaps be disabled when running on other DBMSs.

azimov commented 3 years ago

Cohort diagnostics can run on any OMOP CDM compliant CDM but hosting the Shiny dashboard for large data sets currently only supports postgres. Hosting without postgres is possible with the exported RData objects but this becomes limited with larger data sets.

Postgres was deliberately selected because its open source and supporting multiple database backends may add additional work. We only expect this use case to occur where people want to share the results of a large study with OHDSI, for example across multiple sites in a network study.

We would also almost always expect the postgres instance to be separate from any CDMs as the output of Cohort Diagnostics does not contain patient level data.

gowthamrao commented 3 years ago

@leeevans is there interest to support non postgres based db for the shiny app. Like @azimov said, this only applies to the post computed result set with no person level data that is used by the shiny app

leeevans commented 3 years ago

@azimov thanks for pointing out that the Cohort Diagnostics can be generated (by reading data) from CDMs on any OHDSI (DatabaseConnector) supported DBMS. @gowtham, yes, I guess the scope of this issue is narrowed down to the DBMS support for the cohort diagnostics results database used by the Shiny app.

I see a postgresql dependency in the way the cohort diagnostics results schema tables are created (postgresql only DDL executed) and postgresql database specific upload of the cohort diagnostics results ("PreMerged.Rdata") dataset in the preMergeDiagnosticsFiles() function. There is also a check in the shiny app that stops execution if the database connection DBMS is not 'postgresql'. I know that the scope of this shiny app was to run it only on the OHDSI shiny server postgresql database so this is understandable.

If I could somehow load the 'PreMerged.RData' dataset into a results schema on a Redshift database or a SQL Server database and I removed the postgresql DBMS connection check from the shiny app code would the shiny app work, or are there other additional postgresql dependencies in the shiny app UI/Server code?

Also, is it possible to use the "PreMerged.RData" dataset with the Shiny app or does it require a (postgresql) database? I agree that using a database for the typically large results data is preferred but I just wondered if executing the shiny app using that pre-merged dataset is an option?

gowthamrao commented 3 years ago

is it possible to use the "PreMerged.RData" dataset with the Shiny app or does it require a (postgresql) database? Possible, but as the file size becomes large, the whole file gets read into R memory (maybe for each Shiny instance) - thats probably bad.

are there other additional postgresql dependencies in the shiny app UI/Server code? we chose postgres because we did not want the complexity of SQL translation. if we can support the SQL translation - i.e. make all SQL in the DiagnosticsExplorer shiny app OHDSI SQL compliant - this might work.

Only thing would be - are there any Shiny specific issues with any of the target db

azimov commented 2 years ago

This is now possible in the develop branch