Modify results functions

gowthamrao commented 1 year ago

I would like to request functions that help manipulate cohort diagnostics output. There are many reasons for this that justify this function

Change cohort id: ability to go thru the results in output zipfile the replace oldCohortId with newCohortId. ie. take a data frame object with newCohortId and oldCohortId fields
Change database id: ability to go thru the results in output zipfile and replace oldDatabaseId, oldDatabaseName with newDatabaseId, newDatabaseName
filter results: ability to filter a large result set by cohortId and databaseId. i.e. function should take an array of cohortId's and databaseId's
Compare cohort sql hash between zip file out put and report if there are two cohorts that have the same sql hash but with different cohortId.
Join two or more zip files with results into one zip file.

gowthamrao commented 1 year ago

Justification:

integrate results from multiple sites that have run diagnostics using different definitions
split large studies into smaller
fix issues in labels e.g. use of space in databaseId

azimov commented 1 year ago

Whilst more validation is good, most of these things should be set at the time the study is designed. Doing (and allowing) ad hoc comparisons to merge bits of data is bad practice. I don't think it's a good idea to allow users to merge random results together - these are things that can lead to massive interpretation errors. It's much better practice to force investigator discipline at the study design step than to have utilities to merge badly collected data.

gowthamrao commented 1 year ago

Forcing investigator discipline is much harder in network studies, when contribution is coming from various sources. The use case i am interested in is the following:

A contributor is contributing to the OHDSI Phenotype library. The requirements for submissions are met. The submission would involve executing the cohort (as developed in their local instance with local atlasId and databaseId).
Once the peer review is complete and it is decided to accept this cohort - we need to integrate the submission to https://github.com/ohdsi-studies/PhenotypeLibraryDiagnostics . For this integration of the initial contribution to existing output from the PhenotypeLibraryDiagnostics study - we have ensure we need to extract (if there were unapproved cohortIds) can re-id the cohortId and if needed databaseId. I cant ask them to re-run just because the OHDSI phenotype library has now assigned the submitted cohort and new id (its hard to data partners to have it run once).

This issue is not limited to PhenotypeLibraryDiagnostics study. I have encountered the need to mix and match outputs from other studies.

gowthamrao commented 1 year ago

Change cohort name

Sorry missed this one

OHDSI / CohortDiagnostics

Modify results functions #999