OHDSI / PatientLevelPrediction

An R package for performing patient level prediction in an observational database in the OMOP Common Data Model.
https://ohdsi.github.io/PatientLevelPrediction
188 stars 89 forks source link

Currently released version not passing R check #348

Closed schuemie closed 1 year ago

schuemie commented 1 year ago

I'm not sure why I haven't seen this before, but now I get this error message:

  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
   15. PatientLevelPrediction:::getEvaluationStatistics_binary(...)
   16. PatientLevelPrediction::calibrationLine(...)

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Error ('test-featureEngineering.R:146'): randomForestFeatureSelection ───────
  Error in `checkRam(newcovariateData, 0.9)`: Insufficient RAM
  Backtrace:
      ▆
   1. └─PatientLevelPrediction:::randomForestFeatureSelection(...) at test-featureEngineering.R:146:2
   2.   └─PatientLevelPrediction::toSparseM(trainData)
   3.     └─PatientLevelPrediction:::checkRam(newcovariateData, 0.9)

  [ FAIL 1 | WARN 4 | SKIP 0 | PASS 847 ]
  Error: Test failures
  Execution halted
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ...
  ‘AddingCustomFeatureEngineering.Rmd’ using ‘UTF-8’... OK
  ‘AddingCustomModels.Rmd’ using ‘UTF-8’... OK
  ‘AddingCustomSamples.Rmd’ using ‘UTF-8’... OK
  ‘AddingCustomSplitting.Rmd’ using ‘UTF-8’... OK
  ‘BestPractices.rmd’ using ‘UTF-8’... OK
  ‘BuildingMultiplePredictiveModels.Rmd’ using ‘UTF-8’... OK
  ‘BuildingPredictiveModels.Rmd’ using ‘UTF-8’... OK
  ‘CreatingLearningCurves.Rmd’ using ‘UTF-8’... OK
  ‘CreatingNetworkStudies.Rmd’ using ‘UTF-8’... OK
  ‘InstallationGuide.Rmd’ using ‘UTF-8’... OK
  ‘Videos.rmd’ using ‘UTF-8’... OK
 NONE
* checking re-building of vignette outputs ... OK
* DONE

Status: 1 ERROR
See

Any idea what is going on?

schuemie commented 1 year ago

Note: as an experiment, I deployed a new Github Action that will run R check on PLP every week, in the weekend (see fe2e747017783ad450e4f95bf835c826106b2f54). Right now, the workflow is just a vanilla R check script, I didn't include Python. Feel free to add if you think that is necessary.

schuemie commented 1 year ago

Weird: the R check on PLP main itself did just pass: https://github.com/OHDSI/PatientLevelPrediction/actions/runs/3747241395

I'll run the HADES-wide check again.

jreps commented 1 year ago

PLP has a check when converting the plpData into a sparse matrix to ensure you have enough RAM. If it estimates you have insufficient RAM available, it stops with that error. So the machine running the tests must be overloaded.

schuemie commented 1 year ago

Never mind. It is passing now...

schuemie commented 1 year ago

This test seems to be failing intermittently. Is there maybe a way to make sure it always passes?

── Error ('test-featureEngineering.R:146'): randomForestFeatureSelection ───────
Error in `checkRam(newcovariateData, 0.9)`: Insufficient RAM
Backtrace:
    ▆
 1. └─PatientLevelPrediction:::randomForestFeatureSelection(...) at test-featureEngineering.R:146:2
 2.   └─PatientLevelPrediction::toSparseM(trainData)
 3.     └─PatientLevelPrediction:::checkRam(newcovariateData, 0.9)
── Error ('test-featureEngineering.R:161'): featureSelection is applied on test_data ──
Error in `checkRam(newcovariateData, 0.9)`: Insufficient RAM
Backtrace:
    ▆
 1. └─PatientLevelPrediction:::univariateFeatureSelection(...) at test-featureEngineering.R:161:2
 2.   └─PatientLevelPrediction::toSparseM(trainData, trainData$labels)
 3.     └─PatientLevelPrediction:::checkRam(newcovariateData, 0.9)

As you can see, it passed two weeks in a row, but on the 3rd week it failed: https://github.com/OHDSI/PatientLevelPrediction/actions/runs/3862053323/jobs/6583400031:

egillax commented 1 year ago

Personally I would be in favor of removing the checkRam check. Just because "free" memory is low doesn't mean the memory is not available if a process requested it. See for example this stack overflow answer about MacOs in particular. I'll quote the relevant bit:

So in short, you actually shouldn't care if your free memory is low. In fact, you want it to be low -- free memory is wasted memory (as the OS isn't using it for anything).

So I think the checkRam check can give an error even if you wouldn't run into out of memory issues trying to load the data into memory. I'm pretty sure this is is the case for the above failure. According to the info I can find the macos runners should have 14 GB of memory. The simulated data used for the tests should never use close to that amount, the covariates table should be about 50 MB.

So my suggestion is to not check the ram before loading the sparse data into memory, try it and let it fail if it's to big. And depending on the failure we can give an informative error message using a try catch. This assumes it won't crash the whole rsession if it fails though, but that should be easy to test beforehand.

jreps commented 1 year ago

I think the original motivation was that I would run something and it would take hours before PLP failed due to RAM issues, so I added the check to fail quicker, but it is overly sensitive.

I realized recently, that I think FE extracts all the data into RAM when it downloads the covariates, so in theory, if a computer does not have sufficient RAM it would fail when extracting the data. If the data gets downloaded, then I think the computer is good enough, but the user could have multiple R sessions (or other things) taking up the RAM. Maybe we could just have a warning about the RAM (then the user could cancel if they wanted) rather than stopping it?

If we remove the RAM check completely, we can get rid of a dependency which is good.

schuemie commented 1 year ago

FE does not extract all data in RAM. It directly downloads to an Andromeda object, having only small chunks in RAM at a time.

jreps commented 1 year ago

That's what I thought, but my R crashed when I extracted covariates from a very large cohort. I was warned about the issue from others as well.

schuemie commented 1 year ago

Please document that issue at the appropriate package repo, ideally with a reproducible example.

egillax commented 1 year ago

@jreps I think maybe we should start by changing the checkRam to a warning so it doesn't break the tests. I still would prefer to remove it. I think the issue you mentioned, failing after many hours might not be relevant anymore since now most things are in try catches so if one thing fails it goes to the next. Except if the rsession is crashing I guess. But if you feel it's useful to still have the check I'd be fine with a warning instead of an error.

I still think the memuse package might not be counting "inactive/cached/buffered" memory correctly on macos but I don't have a computer to test on.

jreps commented 1 year ago

I just tested it on my Mac and it seems it may be incorrect. Shall we just remove the check ram and just have a warning printing the estimated amount of RAM required?

egillax commented 1 year ago

Sounds good to me