OHDSI / CohortDiagnostics

An R package for performing various cohort diagnostics.
https://ohdsi.github.io/CohortDiagnostics
40 stars 45 forks source link

*_source_concept_id are not required in CDM, but are required in included_source_concept #1081

Closed gowthamrao closed 7 months ago

gowthamrao commented 9 months ago

This is the results data model of cohort diagnostics that is saying source_concept_id is required. These are fields like condition_source_concept_id, visit_source_concept_id, procedure_source_concept_id

https://github.com/OHDSI/CohortDiagnostics/blob/85bae994c71aafade7598c7a2a5bfb3eb619e5b1/inst/settings/resultsDataModelSpecification.csv#L138

CohortDiagnostics DDL also has it is NOT NULL https://github.com/OHDSI/CohortDiagnostics/blob/85bae994c71aafade7598c7a2a5bfb3eb619e5b1/inst/sql/sql_server/CreateResultsDataModel.sql#L340

*_source_concept_id are NULL'able in CDM.

https://github.com/OHDSI/CommonDataModel/blob/55c4d7a934e0c85d64503a89326cf7454209421a/inst/ddl/5.4/sql_server/OMOPCDM_sql_server_5.4_ddl.sql#L42

https://github.com/OHDSI/CommonDataModel/blob/55c4d7a934e0c85d64503a89326cf7454209421a/inst/ddl/5.4/sql_server/OMOPCDM_sql_server_5.4_ddl.sql#L85

Impact: I am seeing this when i upload results to postgres

rror in private$callback(data, index) : java.sql.BatchUpdateException: Batch entry 237 INSERT INTO phenotypelibrary.included_source_concept (database_id,cohort_id,concept_set_id,concept_id,source_concept_id,concept_subjects,concept_count) VALUES('cdm_iqvia_pharmetrics_plus_v',233.0,4,262.0,NULL,43721843.0,64794671.0) was aborted: ERROR: null value in column "source_concept_id" of relation "included_source_concept" violates not-null constraint

Here the concept_id is ERIP ie. https://athena.ohdsi.org/search-terms/terms/262 It probably does not have a value in the corresponding visit_source_concept_id field in the CDM

To reprdouce: take the VALUES in the error above and upload to postgres with CohortDiagnsotics results data model.

anthonysena commented 8 months ago

Just want to bump this issue as we encountered this recently when using CohortDiagnostics. As Gowtham has pointed out, those *_source_concept_ids are not required so I think the option is to cast the NULL values to 0 if you'd like to retain the primary key for that table.

azimov commented 7 months ago

The issue is that removing the null constraint violates a primary key - it is perfectly valid, and common, for multiple source_concepts to map to the same standard concept id (e.g. ICD9 and ICD10 codes). Therefore, my solution is to set the value to 0 to allow this to occur.

It may be a change to the CDM convention to set the source concept to the standard concept in these cases?

anthonysena commented 7 months ago

Perhaps modifying the query in places like this:

https://github.com/OHDSI/CohortDiagnostics/blob/76911f757b9db74f7fe1c4a7e7405d4d285dc622/inst/sql/sql_server/CohortSourceCodes.sql#L29

to wrap these <domain>_source_concept_id in a ISNULL(<domain>_source_concept_id, 0) would suffice? I think the NULL for <domain>_source_concept_id is meaningful in the context of the row information but in the case of CohortDiagnostics we can cast this to a 0 since keeping the primary key on the results table makes sense to me.