Netflix / Hystrix

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.
24.15k stars 4.71k forks source link

Model semaphore metrics as a stream #983

Open mattrjacobs opened 9 years ago

mattrjacobs commented 9 years ago

See motivations in #943. This builds upon work in #981.

There are 2 types of semaphore metrics:

1) Events - semaphore claimed, semaphore rejected, semaphore returned 2) Utilization - only valid if sampled

We can recover the events from the Command stream in #981 by adding semaphore info to each event. This would allow us to avoid the cost of maintaining separate paths for writing these metrics.

981 also provides the start of a first-class sample stream, which is a better way to model sampling-based metrics

rpalcolea commented 7 years ago

@mattrjacobs are there any plans to implement this? or add the currentConcurrentExecutionCount to hystrix dashboard?

I'm asking because we had an issue last night with hystrix keeping long lived semaphores and it was caused by the zip RxJava operator. Similar to this -> https://github.com/Netflix/Hystrix/issues/901

It took some time to determine this because everything looked healthy in our dashboard :)