OHDSI / FeatureExtraction

An R package for generating features (covariates) for a cohort using data in the Common Data Model.
http://ohdsi.github.io/FeatureExtraction/
61 stars 60 forks source link

Java error from getDbCovariateData #146

Open OskarGauffin opened 3 years ago

OskarGauffin commented 3 years ago

Hello!

After many months of work without this bug, we've recently come across a bug on two separate computers that seem to originate from java.

"Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap')"

We feed a cohort table to getDbCohortMethodData, which calls getDbCovariateData, which in turn calls getDbDefaultCovariateData. Running the code there line-by-line points toward the following call:

rJava::J("org.ohdsi.featureExtraction.FeatureExtraction")$createSql(settings, aggregated, cohortTable, rowIdField, java_array, cdmDatabaseSchema)

We're running sql server, this is the java-version information:

The input arguments to this rJava::J-function are not very helpful as reproducible example, since a restart of R and feeding the same argument gives you a different error message ("Error in jclassName(class, class.loader = class.loader) : java.lang.ClassNotFoundException"). But here are our input arguments:

settings = '{\"temporal\":false,\"DemographicsGender\":true,\"DemographicsAge\":true,\"ConditionOccurrenceLongTerm\":true,\"ConditionGroupEraLongTerm\":true,\"DrugGroupEraLongTerm\":true,\"longTermStartDays\":-180,\"mediumTermStartDays\":-180,\"shortTermStartDays\":-90,\"endDays\":0,\"includedCovariateConceptIds\":[],\"addDescendantsToInclude\":true,\"excludedCovariateConceptIds\":[\"31317\",\"1139699\"],\"addDescendantsToExclude\":true,\"includedCovariateIds\":[]}' aggregated = FALSE cohortTable = "#cohort_person" rowIdField = "subject_id" java_array = rJava::.jarray(as.character(-1)) cdmDatabaseSchema = "OmopCdm.synpuf5pct_20180710"

If we instead feed a drug_era and condition_era instead of cohort-tables to getDbCohortMethodData, things are working again. I've also tried to reproduce the bug starting from Eunomia, but the bug does not appear there.

`connection <- DatabaseConnector::connect(Eunomia::getEunomiaConnectionDetails()) DatabaseConnector::executeSql(connection, "DROP TABLE IF EXISTS cohort_table") DatabaseConnector::executeSql(connection, "CREATE TABLE cohort_table (SUBJECT_ID INT, COHORT_DEFINITION_ID INT, COHORT_START_DATE DATETIME, COHORT_END_DATE DATETIME)") DatabaseConnector::executeSql(connection, "INSERT INTO cohort_table(SUBJECT_ID, COHORT_DEFINITION_ID, COHORT_START_DATE, COHORT_END_DATE) VALUES (1, 1, '2000-01-01 12:00:00', '2000-01-02 12:00:00')") DatabaseConnector::querySql(connection, "SELECT * FROM cohort_table") covSettings <- FeatureExtraction::createCovariateSettings(useConditionGroupEraLongTerm = TRUE, excludedCovariateConceptIds = c(31317, 1139699))

covariateData <- FeatureExtraction::getDbCovariateData(connection = connection, cdmDatabaseSchema = "main", cdmVersion = 5, cohortTable = "cohort_table", cohortTableIsTemp = FALSE, rowIdField = "SUBJECT_ID", covariateSettings = covSettings) `

Any ideas or input on how to solve this would be highly appreciated. Have there been any recent updates to FeatureExtraction/Java-related matters the last week?

schuemie commented 3 years ago

Not sure if this is helpful, but in general I highly recommend not using OpenJDK, but Oracle Java instead.

saravidlin-umc commented 3 years ago

Not sure if this is helpful, but in general I highly recommend not using OpenJDK, but Oracle Java instead.

image I just tried to run with Oracle java instead, but I still get the same error.

schuemie commented 3 years ago

Could you run

ParallelLogger::addDefaultErrorReportLogger()

Then rerun your code to recreate the error, and share the content of the error report?

saravidlin-umc commented 3 years ago

Thanks for helping out Martijn! I run it, but no error file is created. And if I create the file empty before I run, still no content.

saravidlin-umc commented 3 years ago

Wait, of course. I removed a try-catch, and then it was indeed created :)

Thread: Main Message: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap') Level: FATAL Time: 2021-11-11 11:37:01

Stack trace: 15: logFatal(gsub("\n", " ", geterrmessage())) 14: (function () { logFatal(gsub("\n", " ", geterrmessage())) if (!is.null(previousErrorHandler)) { eval(previousErrorHandler) } })() 13: stop(list("java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap') 12: .jcheck(silent = FALSE) 11: .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, .jcast(if (inherits(o, "jobjRef") || inherits(o, "jarrayRef")) o else cl, "java/lang/Object"), .jnew("java/lang/String", method), 10: .jrcall(x, name, ...) 9: general_function_library.R#726: rJava::J("org.ohdsi.featureExtraction.FeatureExtraction")$createSql(settings, aggregated, cohortTable, rowIdField, rJava::.jarray(as.character(cohortId)), cdmDa 8: (function (connection, oracleTempSchema = NULL, cdmDatabaseSchema, cohortTable = "#cohort_person", cohortId = -1, cdmVersion = "5", rowIdField = "subject_id", covariateSettings, targetDatabase 7: do.call(eval(parse(text = fun)), args) 6: general_function_library.R#290: FeatureExtraction::getDbCovariateData(connection = connection, oracleTempSchema = tempEmulationSchema, cdmDatabaseSchema = cdmDatabaseSchema, cdmVersion = cdmVe 5: cohort_module.R#143: custom_getDbCohortMethodData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, targetId = 21, comparatorId = 41, outcomeIds = 31, studyStartDat 4: withCallingHandlers(expr, message = function(c) if (inherits(c, classes)) tryInvokeRestart("muffleMessage")) 3: cohort_module.R#143: suppressMessages(custom_getDbCohortMethodData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, targetId = 21, comparatorId = 41, outcomeIds = 2: execute.R#81: cohort_module(i, maximum_cohort_size = 50, force_create_new = TRUE, only_create_cohorts = FALSE, saddle) 1: execute(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, cohortDatabaseSchema = cohortDatabaseSchema, cohortTable = cohortTable, outputFolderPath = outputFolderPat

R version: R version 4.0.5 (2021-03-31)

Platform: x86_64-w64-mingw32

Attached base packages:

Other attached packages:

Thread: Main Message: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap') Level: FATAL Time: 2021-11-11 11:37:01

Stack trace: 15: logFatal(gsub("\n", " ", geterrmessage())) 14: (function () { logFatal(gsub("\n", " ", geterrmessage())) if (!is.null(previousErrorHandler)) { eval(previousErrorHandler) } })() 13: stop(list("java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap') 12: .jcheck(silent = FALSE) 11: .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, .jcast(if (inherits(o, "jobjRef") || inherits(o, "jarrayRef")) o else cl, "java/lang/Object"), .jnew("java/lang/String", method), 10: .jrcall(x, name, ...) 9: general_function_library.R#726: rJava::J("org.ohdsi.featureExtraction.FeatureExtraction")$createSql(settings, aggregated, cohortTable, rowIdField, rJava::.jarray(as.character(cohortId)), cdmDa 8: (function (connection, oracleTempSchema = NULL, cdmDatabaseSchema, cohortTable = "#cohort_person", cohortId = -1, cdmVersion = "5", rowIdField = "subject_id", covariateSettings, targetDatabase 7: do.call(eval(parse(text = fun)), args) 6: general_function_library.R#290: FeatureExtraction::getDbCovariateData(connection = connection, oracleTempSchema = tempEmulationSchema, cdmDatabaseSchema = cdmDatabaseSchema, cdmVersion = cdmVe 5: cohort_module.R#143: custom_getDbCohortMethodData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, targetId = 21, comparatorId = 41, outcomeIds = 31, studyStartDat 4: withCallingHandlers(expr, message = function(c) if (inherits(c, classes)) tryInvokeRestart("muffleMessage")) 3: cohort_module.R#143: suppressMessages(custom_getDbCohortMethodData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, targetId = 21, comparatorId = 41, outcomeIds = 2: execute.R#81: cohort_module(i, maximum_cohort_size = 50, force_create_new = TRUE, only_create_cohorts = FALSE, saddle) 1: execute(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, cohortDatabaseSchema = cohortDatabaseSchema, cohortTable = cohortTable, outputFolderPath = outputFolderPat

R version: R version 4.0.5 (2021-03-31)

Platform: x86_64-w64-mingw32

Attached base packages:

Other attached packages:

Thread: Main Message: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap') Level: FATAL Time: 2021-11-11 11:37:01

Stack trace: 15: logFatal(gsub("\n", " ", geterrmessage())) 14: (function () { logFatal(gsub("\n", " ", geterrmessage())) if (!is.null(previousErrorHandler)) { eval(previousErrorHandler) } })() 13: stop(list("java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap') 12: .jcheck(silent = FALSE) 11: .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, .jcast(if (inherits(o, "jobjRef") || inherits(o, "jarrayRef")) o else cl, "java/lang/Object"), .jnew("java/lang/String", method), 10: .jrcall(x, name, ...) 9: general_function_library.R#726: rJava::J("org.ohdsi.featureExtraction.FeatureExtraction")$createSql(settings, aggregated, cohortTable, rowIdField, rJava::.jarray(as.character(cohortId)), cdmDa 8: (function (connection, oracleTempSchema = NULL, cdmDatabaseSchema, cohortTable = "#cohort_person", cohortId = -1, cdmVersion = "5", rowIdField = "subject_id", covariateSettings, targetDatabase 7: do.call(eval(parse(text = fun)), args) 6: general_function_library.R#290: FeatureExtraction::getDbCovariateData(connection = connection, oracleTempSchema = tempEmulationSchema, cdmDatabaseSchema = cdmDatabaseSchema, cdmVersion = cdmVe 5: cohort_module.R#143: custom_getDbCohortMethodData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, targetId = 21, comparatorId = 41, outcomeIds = 31, studyStartDat 4: withCallingHandlers(expr, message = function(c) if (inherits(c, classes)) tryInvokeRestart("muffleMessage")) 3: cohort_module.R#143: suppressMessages(custom_getDbCohortMethodData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, targetId = 21, comparatorId = 41, outcomeIds = 2: execute.R#81: cohort_module(i, maximum_cohort_size = 50, force_create_new = TRUE, only_create_cohorts = FALSE, saddle) 1: execute(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, cohortDatabaseSchema = cohortDatabaseSchema, cohortTable = cohortTable, outputFolderPath = outputFolderPat

R version: R version 4.0.5 (2021-03-31)

Platform: x86_64-w64-mingw32

Attached base packages:

Other attached packages:

saravidlin-umc commented 3 years ago

Note/Explanation: In the execution above we used a copy of the original function getDbCohortMethodData() called custom_getDbCohortMethodData(), that we use for debugging. However same problem do arise if I call the built in function directly:

Thread: Main Message: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap') Level: FATAL Time: 2021-11-11 14:11:44

Stack trace: 15: logFatal(gsub("\n", " ", geterrmessage())) 14: (function () { logFatal(gsub("\n", " ", geterrmessage())) if (!is.null(previousErrorHandler)) { eval(previousErrorHandler) } })() 13: stop(list("java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap') 12: .jcheck(silent = FALSE) 11: .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, .jcast(if (inherits(o, "jobjRef") || inherits(o, "jarrayRef")) o else cl, "java/lang/Object"), .jnew("java/lang/String", method), 10: .jrcall(x, name, ...) 9: general_function_library.R#711: rJava::J("org.ohdsi.featureExtraction.FeatureExtraction")$createSql(settings, aggregated, cohortTable, rowIdField, rJava::.jarray(as.character(cohortId)), cdmDa 8: (function (connection, oracleTempSchema = NULL, cdmDatabaseSchema, cohortTable = "#cohort_person", cohortId = -1, cdmVersion = "5", rowIdField = "subject_id", covariateSettings, targetDatabase 7: do.call(eval(parse(text = fun)), args) 6: FeatureExtraction::getDbCovariateData(connection = connection, oracleTempSchema = tempEmulationSchema, cdmDatabaseSchema = cdmDatabaseSchema, cdmVersion = cdmVersion, cohortTable = cohortTable 5: cohort_module.R#142: CohortMethod::getDbCohortMethodData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, targetId = 21, comparatorId = 41, outcomeIds = 31, studyS 4: withCallingHandlers(expr, message = function(c) if (inherits(c, classes)) tryInvokeRestart("muffleMessage")) 3: cohort_module.R#142: suppressMessages(CohortMethod::getDbCohortMethodData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, targetId = 21, comparatorId = 41, outcom 2: execute.R#81: cohort_module(i, maximum_cohort_size = 50, force_create_new = TRUE, only_create_cohorts = FALSE, saddle) 1: execute(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, cohortDatabaseSchema = cohortDatabaseSchema, cohortTable = cohortTable, outputFolderPath = outputFolderPat

R version: R version 4.0.5 (2021-03-31)

Platform: x86_64-w64-mingw32

Attached base packages:

Other attached packages:

schuemie commented 3 years ago

Sorry, I'm still at a loss what is going wrong. I've simplified the problem to just the Java call that is throwing the error. Could you try running this? (It doesn't throw an error for me)

covariateSettings <- FeatureExtraction::createCovariateSettings(useConditionGroupEraLongTerm = TRUE, 
                                                                excludedCovariateConceptIds = c(31317, 1139699))
settings <- FeatureExtraction:::.toJson(covariateSettings)
cohortTable <- "cohort_table"
cohortId <- -1
cdmDatabaseSchema <- "main"
rowIdField <- "SUBJECT_ID"
aggregated <- FALSE

rJava::J("org.ohdsi.featureExtraction.FeatureExtraction")$init(system.file("", package = "FeatureExtraction"))
json <- rJava::J("org.ohdsi.featureExtraction.FeatureExtraction")$createSql(settings, aggregated, cohortTable, rowIdField, rJava::.jarray(as.character(cohortId)), cdmDatabaseSchema)
saravidlin-umc commented 3 years ago

Thanks Martijn! I can indeed run your code.

I've now taken our own code, and made the simpliest version I could come up with, that still emulates our call to getDbCohortMethodData(), and that does also run. I will work my way up to the code we have today, bit by bit, and see exactly what eventually makes the call fail.

saravidlin-umc commented 3 years ago

This is of course what we should have done to start with.

We accidentally used strings instead of integers in the parameter excludedCovariateConceptIds in the function createCovariateSettings. This caused the error.

Thank you for you help and patience, and sorry for creating confusion and taking your time.

schuemie commented 3 years ago

Great to hear the problem is solved. Some more informative error messages would have helped ;-)

@anthonysena: Perhaps some input checks could be added? Perhaps in the createCovariateSettings and createDefaultCovariateSettings functions. (Note that these functions are automatically generated based on the CSVs, so edits would have to be made in the template)