vine: treat worker:// and http:// differently

cooperative-computing-lab / cctools

The Cooperative Computing Tools (cctools) enable large scale distributed computations to harness hundreds to thousands of machines from clusters, clouds, and grids.

Other

134 stars 116 forks source link

Continued discussion from #3729

In the current master, when an input url for a task fails to transfer, the task is retried indefinitely. Previously, the task would fail immediately with input missing. This was changed because often the url transfers would come from workers, which are subject to transient errors.

One view is that http:// errors are the responsibility of the application, while worker:// errors are the responsibility of taskvine proper. E.g., a task with http:// errors could return immediately with input missing, while worker:// errors can be retried indefinitely (with transfers from other workers, recovery tasks, etc.).

Another option is to add to declare_url parameters that would allow taskvine to determine the health of the source: acceptable fail rate per minute, maximum number of connections, etc.

cooperative-computing-lab / cctools

vine: treat worker:// and http:// differently #3731