OHDSI / Andromeda

AsynchroNous Disk-based Representation of MassivE DAta: An R package aimed at replacing ff for storing large data objects.
https://ohdsi.github.io/Andromeda/
11 stars 9 forks source link

java.lang.OutOfMemoryError: Java heap space #26

Closed jreps closed 2 years ago

jreps commented 2 years ago

I keep getting the java heap error (Jill had a similar error as well) when extracting the data into an sqlite database. Jill had the error when extracting a single big cohort. I get the error after running a few cohort extracts, it stops for a bit after I terminate and restart R but then reappears. Is there a memory leak somewhere?

Connecting using Redshift driver

Constructing the at risk cohort |============================================================================================================| 100% Executing SQL took 1.46 secs Fetching cohorts from server Loading cohorts took 24.6 secs Sending temp tables to server Constructing features on server |============================================================================================================| 100% Executing SQL took 1.34 mins Fetching data from server Error: Error executing SQL: java.lang.OutOfMemoryError: Java heap space An error report has been created at C:/Users/admin_jreps/Documents/errorReportSql.txt Run rlang::last_error() to see where the error occurred.

rlang::last_error() <error/rlang_error> Error executing SQL: java.lang.OutOfMemoryError: Java heap space An error report has been created at C:/Users/admin_jreps/Documents/errorReportSql.txt Backtrace:

  1. base::source(...)
  2. PatientLevelPrediction::similarPlpData(...) D:/internalVal/real_data_extra_validation_code.R:51:2
  3. FeatureExtraction::getDbCovariateData(...) D:/GitHub/PatientLevelPrediction/R/SaveLoadPlp.R:175:2
  4. DatabaseConnector::querySqlToAndromeda(...) D:/GitHub/FeatureExtraction/R/GetDefaultCovariates.R:96:6
  5. base::tryCatch(...)
  6. base:::tryCatchList(expr, classes, parentenv, handlers)
  7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
  8. value[3L]
  9. DatabaseConnector:::.createErrorReport(...) Run rlang::last_trace() to see the full context.
jreps commented 2 years ago

Here is the error report:

DBMS: redshift

Error: java.lang.OutOfMemoryError: Java heap space

SQL: SELECT * FROM ( SELECT row_id, covariate_id, covariate_value FROM #cov_1 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_2 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_3 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_4 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_5 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_6 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_7 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_8 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_9 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_10 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_11 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_12 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_13 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_14 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_15 UNION ALL SELECT row_id, covariate_id, covariate_value FROM #cov_16 ) all_covariates;

R version: R version 4.0.5 (2021-03-31)

Platform: x86_64-w64-mingw32

Attached base packages:

Other attached packages:

msuchard commented 2 years ago

One approach might be to load up a Java profiler (I use, e.g., IntelliJ) and attach it to the JVM that R kicks off. You should then be able to watch in real-time the construction / destruction of Java objects along with the calling code line #s.

Alternatively, you might try increasing the heap-size for your JVM. From the command-line -X* options are useful. I am not sure how to specify these from R's .onLoad function.

schuemie commented 2 years ago

I don't want people increasing the heap size, as

  1. It doesn't solve the problem (DatabaseConnector grabs a percentage of the available heap space, so larger heap space means grabbing more memory)
  2. I still see people increasing the heap size from when they thought this solved a problem many years ago. Running HADES code should not require any magical statements beforehand.

I've tried to debug this issue a few weeks ago, but was stuck because the problem only occurs after running a very lengthy script. (several days). If you then restart the script to pick up where it left of it continues without error. Somewhere something doesn't clean up after itself in Java heap space. Setting up a profiler is a bit tricky since it would need to run in R, where all the many steps needed to reproduce the problem are initiated.

Since Andromeda doesn't run in Java, I don't think this is an Andromeda issue.

schuemie commented 2 years ago

Moving this issue here