OHDSI / DataQualityDashboard

A tool to help improve data quality standards in observational data science.
https://ohdsi.github.io/DataQualityDashboard
Apache License 2.0
145 stars 96 forks source link

cdmField check fails in Oracle #162

Closed kirenotneb closed 2 years ago

kirenotneb commented 4 years ago

The cdmField check fails in Oracle due to the double quotes present in the query.

[Level: FIELD] [Check: cdmField] [CDM Table: CARE_SITE] [CDM Field: care_site_name] Error executing SQL: java.sql.SQLSyntaxErrorException: ORA-00904: "care_site_name": invalid identifier  An error report has been created at  output/RDW OMOP/errors/FIELD_cdmField_CARE_SITE_care_site_name.txt

The query that DQD executes:

SELECT num_violated_rows, CASE WHEN denominator.num_rows = 0 THEN 0 ELSE 1.0*num_violated_rows/denominator.num_rows END AS pct_violated_rows,
  denominator.num_rows as num_denominator_rows
FROM (SELECT num_violated_rows FROM (SELECT  CASE when  COUNT("care_site_name") = 0 then 0
      else 0
     END as num_violated_rows
    FROM RDW_OMOP.CARE_SITE cdmTable
    GROUP BY 1) violated_rows
) violated_row_count,
(SELECT 1 as num_rows
FROM DUAL) denominator
;

The innermost query is:

SELECT  CASE when  COUNT("care_site_name") = 0 then 0
      else 0
     END as num_violated_rows
    FROM RDW_OMOP.CARE_SITE cdmTable
    GROUP BY 1;

refers to the column name care_site_name in lower case, but our database it is capitalized CARE_SITE_NAME due to the manner how Oracle interprets the DDL for table creation.

This appears to have also been an issue with PostgreSQL as well (#151 ).

clairblacketer commented 4 years ago

@alondhe can you take a look at this for Oracle? I don't have an Oracle environment to test this in.