Closed thebigw4lrus closed 1 year ago
I think you can achieve this behavior using Cool off time
The stoplight will move into the yellow state after being in the red state for a while. ... When stoplights are yellow, they will try to run their code. If it fails, they'll switch back to red. If it succeeds, they'll switch to gren
Cool off time, as I understood is the time that will take to transition from red to yellow. I was talking about measuring the amount of failures (in a period time window) to transition from green to red. Imagine you have a situation where you have ocassional failures, and these are not relevant as the traffic is really high. In this escenario, It would be useful to have a sort of window time in which we can evaluate if a downstream service is really down rather than using a counter for this.
Am I understand right?
That makes sense to me!
Most likely I'd prefer to have both behaviors at the same time -- to disable a service it either has to fail X times in a row or Y times (probably better to express it in percentage) in Z seconds.
Hi @bolshakov !,.. I have a PR based on what you did here: https://github.com/orgsync/stoplight/pull/132
Maybe I can help you to finish this feature. I tried to push the branch but no permission. Can I help you? If so, I will gladly put it on review. :)
Thanks!
@thebigw4lrus any help will be very appreciated since I have a limited time for that. I granted you access to push to the branch (please check your email)
And there are several unhandled issues left which must be resolved before merging:
I don't want to rash with pushing the gem until it's thoroughly tested. Feel free to test it in your system, if you're brave enough :)
@thebigw4lrus feel free to reach me out if you need help with that! Thanks for your collaboration!
hi @bolshakov , can you redo the action to access to push the branch?. It turned out I had to change github emails as had an issue, and I can't reach the email you(github) sent to me. Thanks!
Context
Sometimes some services fails from time to time. Considering high traffic, these does not cause a real disruption. In this kind of scenario, the real problem comes when the downstream service fails a lot in a short period of time.
Problem
This Gem establishes a
threshold
key in the storage, that will count how many time the abovementioned service has failed, but it does not take in account window time for this.Proposal
To introduce a way to reset the threshold after X seconds has passed. In redis this can be modeled via TTL.