Open cvgaviao opened 6 days ago
Without knowing the details: You can use listeners to notify your monitoring tool of Failsafe events. You can provide a computed delay function to a RetryPolicy that calculates the next delay based on runtime state. You can't change the duration of a Timeout, but you could compose a Timeout within a RetryPolicy to get something like the effect of a dynamically configurable Timeout.
Or you could consider standalone execution, just using Failsafe to record attempts, but controlling the flow yourself.
I tried to access the slack space to do this question, but I couldn't login there. seems that we need an invite.
I'm using Failsafe's Retry, Timeout and Fallback for some time, and they are working great for the majority of my team's use cases.
But I have a use case were they are not fitting as we need.
We have some process chains that depends on another team's jobs that send us (using ssh) some files which we need to use. Sometimes those files are not sent in the proper time, so a failure occurs, and retried until a limited time. But sometimes the resolution on the origin takes longer than the values we set for retries and timeout.
I'm thinking that one solution would be to increase the timeout and the delay between retries, or even cancel the retries (to use FallBack), but at runtime.
It would be possible to notify our monitoring tool through its API when an error have occurred, and also implement a rest to retrieve parameters, but how could we have Timeout and Retry policies modified and used after they have started?
Ideas and alternatives are welcome too :)