Open chris-b1 opened 2 years ago
This is something we should have somewhere, but not necessarily in Dagger's core, since we may want more "supervisory actions" than just retries and delay-based retry. For example, we might want to trigger a retry based on an active signal (such as an error asynchronously delivered via a library API). Or we might want to retry with backoff, or do a more complicated set of failure recovery steps that depends on the state of multiple thunks.
Instead of building this in directly, this functionality could be implemented with a supervisor thunk which launches and monitors flaky_function
:
function supervisor(f, args...)
h = Dagger.Sch.sch_handle()
res = nothing
for i in 1:3
try
return fetch(Dagger.@spawn f(args...))
catch err
if i == 3
rethrow(err)
else
@debug "Failed to execute $f on iteration $i, retrying in 1 second..."
sleep(1)
end
end
end
end
function flaky_function(x, y, z)
if rand() < 0.5
return x + y + z
else
error("Transient error")
end
end
fetch(Dagger.@spawn supervisor(flaky_function, 1, 2, 3))
We could put such supervisor functions into their own package (which could be a subpackage of this repo), maybe DaggerSupervisors.jl
?.
This may or may not make sense at the Dagger level, but for consideration - as an example copying the prefect keywords below