build-canaries / nevergreen

:baby_chick: A build monitor with attitude
https://nevergreen.io
Eclipse Public License 1.0
215 stars 38 forks source link

Add a deadline timeout when calling Nevergreen server -> CI server #254

Closed GentlemanHal closed 5 years ago

GentlemanHal commented 6 years ago

The current timeouts when calling Nevergreen server -> CI server are for establishing the initial connection and for receiving each packet of data. Whereas the timeouts when calling the UI -> Nevergreen server are for receiving the first packet of data and a deadline for the entire response to complete.

This means if the CI server is responding slowly the Nevergreen server wouldn't timeout but the UI would. Given we've added automatic retries as part of #238 and the monitor view could be polling every 5 seconds, this could result in loads of extra calls to a CI server already responding slowly.

We should add a deadline timeout to the CI server calls that matches the timeout of the UI. This would mean the server timeouts at the same time as the UI, and we wouldn't be stacking calls to a probably broken CI server.

Unfortunately this isn't simple with clj-http, which is the library we are using to make http calls, it requires adding a manual timer and aborting the request if it takes too long as per this example: https://github.com/dakrone/clj-http#cancelling-requests

jainsahab commented 5 years ago

Hi @GentlemanHal , I was thinking of solving this issue using circuit breaker pattern. We can break the circuit from nevergreen -> ci server after a certain threshold is reached and when UI is polling it'll get the same response for some time until the circuit is established again. Let me know your thoughts on this and then I can pick this one up.

GentlemanHal commented 5 years ago

Hi @jainsahab , thanks for offering to help!

My first thought is the circuit breaker pattern might be overkill for the issue. I'd be interested in hearing some specific implementations details however.

Some things to consider:

I'm thinking that adding the deadline timeout as originally stated (or finding an alternative library that provides the functionality out the box) is the way to close this issue, and maybe doing something more complicated like circuit breaking to solve issue #199.

jainsahab commented 5 years ago

sure @GentlemanHal , Yes. Having a circuit breaker would add the overhead of maintaining persistent storage. and it would be a bit difficult to maintain when having multiple CIs configured on UI.

then, I'll go with the originally stated solution by you. Thanks :)

GentlemanHal commented 5 years ago

Done via pull request #293