MetOffice / dagrunner

⛔[EXPERIMENTAL] Directed acyclic graph (DAG) runner and tools
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

ENH: Remote data polling and added a load abstract class #52

Closed cpelley closed 1 month ago

cpelley commented 1 month ago

Issues

cpelley commented 1 month ago

In an ideal network world connections always connect. In reality there are connection attempt failures and connections get dropped. I wondered if this should be a short (maybe 2 or 3 attempts) loop that will retry in the event of a failure. This could greatly improve reliability of doing the scp command.

@mo-robert-purvis, indeed. Thanks for drawing attention to this. I'm hopping that this and other potentially momentary failures (e.g. dealing with lustre file system issues) are handled by fault handling of the scheduler, but will have to wait and see how/what it can do. Perhaps adding fault handling internally will be necessary - TBD.