Closed SulevR closed 4 years ago
This is occurring in Sql Server as well.
SQL Server : Microsoft SQL Server 2016 (SP2-CU10) (KB4524334) - 13.0.5492.2 (X64)
Oct 4 2019 19:14:08
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows Server 2016 Datacenter 10.0
R packages: see bottom
with the following fields: CDM_TABLE_NAME CDM_FIELD_NAME CONDITION_ERA CAST(condition_era_start_date AS DATE) CONDITION_ERA CAST(condition_era_end_date AS DATE) CONDITION_OCCURRENCE CAST(condition_start_datetime AS DATE) CONDITION_OCCURRENCE CAST(condition_end_datetime AS DATE) DEVICE_EXPOSURE CAST(device_exposure_start_datetime AS DATE) DEVICE_EXPOSURE CAST(device_exposure_end_datetime AS DATE) DOSE_ERA CAST(dose_era_start_date AS DATE) DOSE_ERA CAST(dose_era_end_date AS DATE) DRUG_ERA CAST(drug_era_start_date AS DATE) DRUG_ERA CAST(drug_era_end_date AS DATE) DRUG_EXPOSURE CAST(drug_exposure_start_datetime AS DATE) DRUG_EXPOSURE CAST(drug_exposure_end_datetime AS DATE) NOTE CAST(note_datetime AS DATE) NOTE_NLP CAST(nlp_datetime AS DATE) OBSERVATION CAST(observation_datetime AS DATE) PERSON CAST(birth_datetime AS DATE) PROCEDURE_OCCURRENCE CAST(procedure_datetime AS DATE) SPECIMEN CAST(specimen_datetime AS DATE) VISIT_DETAIL CAST(visit_detail_start_datetime AS DATE) VISIT_DETAIL CAST(visit_detail_end_datetime AS DATE) VISIT_OCCURRENCE CAST(visit_start_datetime AS DATE) VISIT_OCCURRENCE CAST(visit_end_datetime AS DATE)
Example SELECT num_violated_rows, CASE WHEN denominator.num_rows = 0 THEN 0 ELSE 1.0*num_violated_rows/denominator.num_rows END AS pct_violated_rows FROM ( select num_violated_rows from ( select case when count("CAST(observation_datetime AS DATE)") = 0 then 0 else 0 end as num_violated_rows from omop.dbo.OBSERVATION ) violated_rows ) violated_row_count, ( SELECT 1 as num_rows ) denominator ;
R Packages:
base | 3.2.2 | NULL |
---|---|---|
boot | 1.3-17 | R (>= 3.0.0), graphics, stats |
class | 7.3-13 | R (>= 3.0.0), stats, utils |
cluster | 2.0.3 | R (>= 2.15.0), utils |
codetools | 0.2-14 | R (>= 2.1) |
compiler | 3.2.2 | NULL |
datasets | 3.2.2 | NULL |
doParallel | 1.0.10 | R (>= 2.14.0), foreach(>= 1.2.0), iterators(>= 1.0.0), |
parallel, utils | NULL | compiler, RUnit |
doRSR | 8.0.3 | R (>= 2.5.0), foreach(>= 1.2.0), iterators(>= 1.0.0), RevoScaleR(>= 2.0-0), utils, |
RevoUtils | NULL | NULL |
foreach | 1.4.3 | R (>= 2.5.0) |
foreign | 0.8-65 | R (>= 3.0.0) |
graphics | 3.2.2 | NULL |
grDevices | 3.2.2 | NULL |
grid | 3.2.2 | NULL |
iterators | 1.0.8 | R (>= 2.5.0), utils |
KernSmooth | 2.23-15 | R (>= 2.5.0), stats |
lattice | 0.20-33 | R (>= 3.0.0) |
MASS | 7.3-43 | R (>= 3.1.0), grDevices, graphics, stats, utils |
Matrix | 1.2-2 | R (>= 3.0.1) |
methods | 3.2.2 | NULL |
mgcv | 1.8-7 | R (>= 2.14.0), nlme (>= 3.1-64) |
nlme | 3.1-121 | graphics, stats, R (>= 3.0.0) |
nnet | 7.3-10 | R (>= 2.14.0), stats, utils |
parallel | 3.2.2 | NULL |
pkgXMLBuilder | 1 | NULL |
RevoIOQ | 8.0.3 | R (>= 2.6.2), RUnit (>= 0.4.26), tools |
revoIpe | 1.1 | NULL |
RevoMods | 8.0.3 | R (>= 2.6.2) |
RevoPemaR | 8.0.3 | R (>= 3.1.1), methods |
RevoRpeConnector | 8.0.3 | NULL |
RevoRsrConnector | 8.0.3 | NULL |
RevoScaleR | 8.0.3 | R (>= 3.2.2) |
RevoTreeView | 8.0.3 | NULL |
RevoUtils | 8.0.3 | NULL |
RevoUtilsMath | 8.0.3 | NULL |
rpart | 4.1-10 | R (>= 2.15.0), graphics, stats, grDevices |
RUnit | 0.4.26 | R (>= 2.5.0), utils (>= 2.5.0), methods (>= 2.5.0) |
spatial | 7.3-10 | R (>= 3.0.0), graphics, stats, utils |
splines | 3.2.2 | NULL |
stats | 3.2.2 | NULL |
stats4 | 3.2.2 | NULL |
survival | 2.38-3 | R (>= 2.13.0), graphics, stats |
tcltk | 3.2.2 | NULL |
tools | 3.2.2 | NULL |
translations | 3.2.2 | NULL |
utils | 3.2.2 | NULL |
XML | 3.98-1.1 | R (>= 1.2.0), methods, utils |
Same thing for Amazon Redshift - every check with "CAST([datetime_field] AS DATE)" fails with error column "cast([datetime_field] as date)" does not exist in [cdmTableName].
Yeah, not sure why those quotes are there. Good catch! Fixing in my develop branch with some other fixes, aiming for next week to push to master
Ahh, now I remember, we put the quotes because Postgres was having issues parsing count(offset)
-- offset is a field name in the CDM table note_nlp
Hacky, but I've added a Postgres only route that uses quotes. All other dialects will not use quotes.
Thank you, we need to update this on our Broadsea docker container and remove our workarounds for SQL. Much appreciated Ajit!
So I think what we may do instead is remove the cast statements. That way we should be able to keep the quotes for all dialects as it is really the cast statements that are causing the errors.
@Andreyyiv -- the quotes are fine in Postgres, it was the casting that was causing the issue. I checked this on my Postgres instance
@alondhe the quotes are causing issues in Oracle, e.g. ORA-00904: "person_id": invalid identifier
. The column names in quotes should be in the upper case, e.g. "PERSON_ID"
. Without quotes it will match regardless of the case.
@alondhe the quotes are causing issues in Oracle, e.g.
ORA-00904: "person_id": invalid identifier
. The column names in quotes should be in the upper case, e.g."PERSON_ID"
. Without quotes it will match regardless of the case.
@Andreyyiv - so if we change it to use uppercase for all cdm field names in the queries, but with quotes, it'll work?
@alondhe Oracle creates objects in the upper case by default, and without quotes any case (e.g. mixed) can be used in the query. With quotes, the case must match to the table definition. In our instance the case was not explicitly specified, so either removing quotes or using upper case in quotes should work. However, if during the table creation in some other Oracle instance the specific case pattern was used like "Person_ID"
, the only method to specify this column in the query is "Person_ID"
.
@Andreyyiv -- got it! I think for now, I will change the cdmFieldName values in the queries to be upper case to ensure it works. Can you try the develop branch?
remove.packages("DataQualityDashboard")
restart R session
devtools::install_github("OHDSI/DataQualityDashboard", ref = "develop")
@alondhe I tested the develop
branch. The results are mixed. The Oracle errors associated with the field names are resolved, so this is an improvement. Also the log file contains several errors due to data quality (this is expected, I'll investigate and fix these outside this thread). The last record in the log file is
[Main thread] FATAL DataQualityDashboard $<- Error in ``$<-.data.frame``(``*tmp*``, "DQD_VERSION", value = "1.0.0") : replacement has 1 row, data has 0 Calls: <Anonymous> -> .summarizeResults -> $<- -> $<-.data.frame
(double ``
is actually a single backtick in the original log), and the result JSON file was not generated.
@Andreyyiv -- do you have a record in CDM_SOURCE? Because this error seems to indicate the data frame with that record came back as empty, and then tries to append to it.
@alondhe The CDM_SOURCE
table was empty indeed. I inserted a row there, and now all the cdmField
checks are complete. The JSON file was generated. So the issue is resolved.
Also I've noticed that this helped with the shiny page: previously both Metadata and Results sections were empty even there was the results data in the JSON file, now both sections have the expected contents.
Thank you!
Tried to run it on PostgreSQL database, but most of the SQL scripts end up with error like 'ERROR: column "CAST(birth_datetime AS DATE)" does not exist'.
It seems that the produced SQL is not correct - uses quotes where it should not:
...select case when count("CAST(birth_datetime AS DATE)") = 0 ...
I tested that the same script without quotes works well:
...select case when count(CAST(birth_datetime AS DATE)) = 0 ...
Full stacktrace here: