OHDSI / Ares

A Research Exploration System
https://ohdsi.github.io/Ares/
Apache License 2.0
12 stars 6 forks source link

AresIndexer::augmentConceptFiles fails with "! object 'CONCEPT_ID' not found" #239

Open lav-patel opened 1 year ago

lav-patel commented 1 year ago

Describe the bug Running the code from ares docs**

server  <- Sys.getenv("PGHOST_PGDATABASE")
user <- Sys.getenv("PGUSER")
password <- Sys.getenv("PGPASSWORD")

cdmVersion <- "5.4" 
cdmDatabaseSchema <- "omop_from_cdm"
resultsDatabaseSchema <- "omop_edc_results"
numThreads <- 4
sqlOnly <- FALSE
createIndices <- TRUE
outputFolder <- "output"
cdmSourceName <- "omop_from_cdm" # a human readable name for your CDM source
verboseMode <- FALSE # set to TRUE if you want to see activity written to the console
writeToTable <- TRUE # set to FALSE if you want to skip writing to a SQL table in the results schema
checkLevels <- c("TABLE", "FIELD", "CONCEPT") # which DQ check levels to run 
checkNames <- c() # which DQ checks to run?  # Names can be found in https://github.com/OHDSI/DataQualityDashboard/blob/main/inst/csv/OMOP_CDMv5.4_Check_Descriptions.csv
aresDataRoot <- "output/webserver_root/ares/data"

# run achilles
Achilles::achilles(cdmVersion = cdmVersion,
    connectionDetails = connectionDetails,
    cdmDatabaseSchema = cdmDatabaseSchema,
    resultsDatabaseSchema = resultsDatabaseSchema,
    #numThreads=numThreads,
    #sqlOnly = sqlOnly,
    #createIndices = createIndices
)
# obtain the data source release key (naming convention for folder structures)
releaseKey <- AresIndexer::getSourceReleaseKey(connectionDetails, cdmDatabaseSchema)
datasourceReleaseOutputFolder <- file.path(aresDataRoot, releaseKey)

# run data quality dashboard and output results to data source release folder in ares data folder
dqResults <- DataQualityDashboard::executeDqChecks(
    connectionDetails = connectionDetails,
    cdmDatabaseSchema = cdmDatabaseSchema,
    resultsDatabaseSchema = resultsDatabaseSchema,
    vocabDatabaseSchema = cdmDatabaseSchema,
    cdmVersion = cdmVersion,
    cdmSourceName = cdmSourceName,
    outputFile = "dq-result.json",
    outputFolder = datasourceReleaseOutputFolder
    #numThreads = numThreads,
    #sqlOnly = sqlOnly,
    # verboseMode = verboseMode,
    # writeToTable = writeToTable,
    # checkLevels = checkLevels,
    # checkNames = checkNames
)

# inspect logs
#ParallelLogger::launchLogViewer(logFileName = file.path(outputFolder, 
#                                                      sprintf("log_DqDashboard_%s.txt", cdmSourceName)))

# export the achilles results to the ares folder
Achilles::exportAO(
    connectionDetails = connectionDetails,
    cdmDatabaseSchema = cdmDatabaseSchema,
    resultsDatabaseSchema = resultsDatabaseSchema,
    vocabDatabaseSchema = cdmDatabaseSchema,
    outputPath = aresDataRoot
)

# perform temporal characterization
outputFile <- file.path(datasourceReleaseOutputFolder, "temporal-characterization.csv")
Achilles::performTemporalCharacterization(
    connectionDetails = connectionDetails,
    cdmDatabaseSchema = cdmDatabaseSchema,
    resultsDatabaseSchema = resultsDatabaseSchema,
    outputFile = outputFile
)

# augment concept files with temporal characterization data
AresIndexer::augmentConceptFiles(releaseFolder = file.path(aresDataRoot, releaseKey))

To Reproduce Steps to reproduce the behavior:

  1. install the mentioned R packages in the below section
  2. and run the above R code

Expected behavior should not have got following error:

Error in `count()`:
ℹ In argument: `CONCEPT_ID`.
Caused by error:
! object 'CONCEPT_ID' not found
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/dplyr:::mutate_error>
Error in `count()`:
ℹ In argument: `CONCEPT_ID`.
Caused by error:
! object 'CONCEPT_ID' not found
---
Backtrace:
  1. AresIndexer::augmentConceptFiles(...)
  4. dplyr:::count.data.frame(., CONCEPT_ID, tolower(CDM_TABLE_NAME))
  6. dplyr:::group_by.data.frame(x, ..., .add = TRUE, .drop = .drop)
  7. dplyr::group_by_prepare(.data, ..., .add = .add, error_call = current_env())
  8. dplyr:::add_computed_columns(.data, new_groups, error_call = error_call)
  9. dplyr:::mutate_cols(...)
 11. dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
 12. mask$eval_all_mutate(quo)
 13. dplyr (local) eval()
Run `rlang::last_trace()` to see the full context.

Desktop (please complete the following information):

alabarga commented 1 year ago

I'm getting the same error, this worked fine a couple of months ago. Maybe changes in Achilles or DataQualityDashboard outputs?

alabarga commented 1 year ago

the AresIndexer::buildNetworkIndex() call does not work either


Error in `dplyr::select()`:
! Can't subset columns that don't exist.
✖ Column `CheckResults.EXECUTION_TIME` doesn't exist.
alabarga commented 1 year ago

maybe is related to this? https://github.com/OHDSI/Ares/issues/199

alabarga commented 1 year ago

using

remotes::install_github('ohdsi/DatabaseConnector@v5.1.0')
remotes::install_github('OHDSI/DataQualityDashboard@v1.4.1', force=TRUE)

I manage to run AresIndexer::augmentConceptFiles(releaseFolder = file.path(aresDataRoot, releaseKey))

with warning


Warning message:
There was 1 warning in `filter()`.
ℹ In argument: `!is.na(results$CONCEPT_ID) && results$FAILED == 1`.
Caused by warning in `!is.na(results$CONCEPT_ID) && results$FAILED == 1`:
! 'length(x) = 2970 > 1' in coercion to 'logical(1)' 

I also manage to run

AresIndexer::buildNetworkIndex(sourceFolders = sourceFolders, outputFolder = aresDataRoot)
AresIndexer::buildDataQualityIndex(sourceFolders = sourceFolders, outputFolder = aresDataRoot)

but

AresIndexer::buildNetworkUnmappedSourceCodeIndex(sourceFolders = sourceFolders, outputFolder = aresDataRoot)

fails with


> AresIndexer::buildNetworkUnmappedSourceCodeIndex(sourceFolders = sourceFolders, outputFolder = aresDataRoot)
Error in `group_by()`:
! Must group by variables found in `.data`.
Column `CDM_TABLE_NAME` is not found.
Column `CDM_FIELD_NAME` is not found.
Column `SOURCE_VALUE` is not found.
Run `rlang::last_error()` to see where the error occurred.

Hope it helps!

cc @clairblacketer

alabarga commented 1 year ago

also not all concepts seems to have been exported

image

when are the data/Synthea/20230216/concepts/measurement/concept_3015182.json etc files created? with the Achilles::exportAO() call?

alabarga commented 1 year ago

maybe @fdefalco can help!