OHDSI / CohortGenerator

An R package for instantiating cohorts using data in the CDM.
https://ohdsi.github.io/CohortGenerator/
11 stars 10 forks source link

We should provide `time-out` functionality #107

Open msuchard opened 1 year ago

msuchard commented 1 year ago

Really helpful in some data sources would be a time-out feature in which the package kills a query (and then importantly continues onto the next task) that executes for longer than a specified time.

Might this be easily feasible @azimov @anthonysena ?

azimov commented 1 year ago

I think this should be possible to implement with something like R.utils::withTimeout though we would need to test how well this plays with DatabaseConnector and its calls to java.

I think there are some other questions about the default behaviour - I assume this is to have no timeout rather than assert that no cohort should take more than 10 hours to execute?

anthonysena commented 9 months ago

Think we'll need to think about the behavior but seems possible and reasonable. I suppose we'd want to apply a uniform timeout to call cohort generations in this case. I agree with @azimov that we'd want to test this out to make sure that the timeout plays nicely with database connections.

anthonysena commented 2 weeks ago

I have an initial implementation of this feature on the time-out branch and relevant changes are here: https://github.com/OHDSI/CohortGenerator/commit/aadf6240bfb0ec97d5548f1e3a4d1611ec8557ed.

My main concern with the current implementation is the database connection handling. The generateCohortSet function opens a single database connection which is passed into an internal function generateCohort and while the timeout does stop the cohort generation (on the R side) it doesn't forcibly close the database connection which would lead to connection leaking thus defeating the intent of this functionality.

The work on this branch could be further extended such that when a timeout > 0 is specified, we pass only the connectionDetails to the generateCohort function so that a single connection is used for this operation. Then when a timeout is encountered, we forcibly close the connection.

I was also having a bit of trouble with the unit tests for this functionality so stopped short of getting this into the v0.10 release. If there is a desire to move this functionality forward, please let me know and I can try to finish it up for another v0.x release.