RhodiumGroup / rhg_compute_tools

Tools for using compute.rhg.com and compute.impactlab.org
MIT License
1 stars 4 forks source link

add retry func #100

Closed bolliger32 closed 3 years ago

bolliger32 commented 3 years ago

Added a decorator to execute a function with timeout features, either locally or from a dask worker.

bolliger32 commented 3 years ago

@delgadom

Is deleting the future all that's required to make this work in the dask version?

Nope. In fact, we don't really need to delete the future so I've just taken that out. We're redefining fut each loop so explicitly deleting it doesn't really do anything.

Does this work with complex cases like dask->r2py->fortran or just pure python functions?

Good q. It has for my particular dask->rpy2->fortran case so I would imagine that it handles the general case where there are subprocesses called from within func. I've added rpy2 to the test to make sure it can handle that. If you can think of a good quick fortran (or some other compiled language) subprocess call to add to the tests let me know.

And does this distinguish between futures waiting to be executed vs. actually taking this amount of time?

For the dask case, no. Right now there is a tradeoff - either you use the non-dask version and get the timeout clock starting at the beginning of execution, but in this case you can only call this from a worker if threads_per_worker=1. Or you use the dask version, which can execute on any dask worker but the timeout call starts essentially from task submission. I don't have a great way to handle the case where you want the best of both worlds. Daniel and I tried some things yesterday where you like check the scheduler to see which jobs are processing, etc. But it seemed to produce weird results and slow things down immensely. Haven't looked into that option too much yet.

I tried to clarify this (admittedly confusing) caveat in the docstring, but let me know if you think it could use some additional clarification.

delgadom commented 3 years ago

ohhh got it. I read that but didn't fully appreciate that job submission could include "pending" time. that makes sense. this seems like a great, useful tool and we can keep chipping away at the magic bullet that cleans up lists of futures or something but this definitely passes the bar for me of being a valuable addition. Thanks! and merge away whenever you think it's ready