(Wasn't sure if I should post this here or with CohortGeneratorModule)
The results model currently contains a subset_definition_id field in the cohort_definition table, but this doesn't point to anything in the results model. It would be good if we had information on the subsetting in the database.
A practical example of why this is problematic: We currently cannot link exposures to their indications. I'm currently parsing the settings JSON to get this information, but ideally the results model would be stand-alone.
A simply solution would be to dump the subset JSONs in a table, which would then have the following definition:
subset_definition_id (BIGINT)
json (TEXT)
But this would require parsing the JSON to join exposures to indications. If possible we'd like to have a parsed representation in the database, so that table might look something like
subset_definition_id (BIGINT)
sequence_id (INT)
name (TEXT)
subset_type (VARCHAR)
cohort_id_set_id (INT)
cohort_combination_operator (VARCHAR)
with a second table called something like 'cg_subset_cohort_id_set` with structure
(Wasn't sure if I should post this here or with CohortGeneratorModule)
The results model currently contains a
subset_definition_id
field in thecohort_definition
table, but this doesn't point to anything in the results model. It would be good if we had information on the subsetting in the database.A practical example of why this is problematic: We currently cannot link exposures to their indications. I'm currently parsing the settings JSON to get this information, but ideally the results model would be stand-alone.
A simply solution would be to dump the subset JSONs in a table, which would then have the following definition:
But this would require parsing the JSON to join exposures to indications. If possible we'd like to have a parsed representation in the database, so that table might look something like
with a second table called something like 'cg_subset_cohort_id_set` with structure