Open btovar opened 6 months ago
Yes, I agree with this interpretation. HTTP transfers and worker transfers are different primarily because the latter allows failures to be handled internally. However, note that there is a distinction between the original source of a file and the current location. A file obtained by HTTP can be put into the cluster, and then transferred laterally using worker transfers. That distinction is not cleanly maintained in the worker.
Continued discussion from #3729
In the current master, when an input url for a task fails to transfer, the task is retried indefinitely. Previously, the task would fail immediately with input missing. This was changed because often the url transfers would come from workers, which are subject to transient errors.
One view is that http:// errors are the responsibility of the application, while worker:// errors are the responsibility of taskvine proper. E.g., a task with http:// errors could return immediately with input missing, while worker:// errors can be retried indefinitely (with transfers from other workers, recovery tasks, etc.).
Another option is to add to
declare_url
parameters that would allow taskvine to determine the health of the source: acceptable fail rate per minute, maximum number of connections, etc.