OHDSI / DataQualityDashboard

A tool to help improve data quality standards in observational data science.
https://ohdsi.github.io/DataQualityDashboard
Apache License 2.0
145 stars 96 forks source link

Error: replacement has 2 rows, data has 1 #55

Closed tarunshah closed 5 years ago

tarunshah commented 5 years ago

I'm getting this error while running DataQualityDashboard.

Connecting using SQL Server driver
Processing check description: measurePersonCompleteness
Processing check description: cdmField
[Level: FIELD] [Check: cdmField] [CDM Table: COST] [CDM Field: revenue_code_source_value] Error executing SQL:
com.microsoft.sqlserver.jdbc.SQLServerException: Invalid column name 'revenue_code_source_value'.
An error report has been created at  D:/Tarun/OHDSI/DataQualityDashboard/OHDSI_Source/errors/FIELD_cdmField_COST_revenue_code_source_value.txt
Processing check description: isRequired
Processing check description: cdmDatatype
Processing check description: isPrimaryKey
Processing check description: isForeignKey
Processing check description: fkDomain
Processing check description: fkClass
Processing check description: isStandardValidConcept
Processing check description: standardConceptRecordCompleteness
Processing check description: sourceConceptRecordCompleteness
Processing check description: sourceValueCompleteness
Processing check description: plausibleValueLow
Processing check description: plausibleValueHigh
Processing check description: plausibleDuringLife
Processing check description: plausibleValueLow
Processing check description: plausibleValueHigh
Processing check description: plausibleGender
Connecting using SQL Server driver
Error in `$<-.data.frame`(`*tmp*`, "THRESHOLD_VALUE", value = c(5L, 5L : 
  replacement has 2 rows, data has 1

But when I ran up-to SourceValueCompleteness it was successfully completed. Any Idea why this is coming?

clairblacketer commented 5 years ago

Hi @tarunshah typically this happens when there are two thresholds in the underlying csv file for the same data quality check. I will take a look at this on my side, but is it possible there was a duplicate record somewhere in your csv?

lrasmus commented 5 years ago

@clairblacketer - thanks for investigating. I'm running into the same error message on my setup as well. Also using SQL Server, and I will note that I'm using an older version of CDM (not 5.3.1). Wasn't sure if that played a role in the error. I will look into the CSV - I was just using what was pulled down for the package.

conor-d-mcgrath commented 5 years ago

Hi @clairblacketer, I've just tried running this and I have the same error. I didn't change any CSV files.

Michael-Khladkovsky commented 5 years ago

Hi @clairblacketer. I've the same issue during the computation(using Ubuntu with Redshift). After removing all records after SourceValueCompleteness (including it) in OMOP_CDMv5.3.1_Check_Descriptions.csv everything is working fine.

clairblacketer commented 5 years ago

Thanks all for bringing this up - I think I see the problem in the latest commit so I am testing a fix and will deploy shortly.

clairblacketer commented 5 years ago

Please pull down the master branch and try again, it should be fixed now. If you made any changes to the csv files that you want to keep, remember to save them somewhere else so that your changes aren't overwritten.