Netflix / Hystrix

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.
23.98k stars 4.7k forks source link

Hystrix dashboard wrong latency metrics. #1987

Open GiteshKhannaOYO opened 4 years ago

GiteshKhannaOYO commented 4 years ago

I have observed that when we visualize hystrix dashboard the service p90, p99s are always lesser than the timeout set for the hystrix command whereas the external service is actually having a much higher p90, p99.

My understanding is that hystrix uses two threads. One for timeout/fallback other for service execution. My observation above suggests me that whenever there is a timeout, hystrix timeout/fallback thread sends an event for the timeout, which is also used for latency metrics, but the thread which is executing the code is not sending any event to the stream, hence leading to wrong latency metrics. Is this true?