Implement rate limiter / load shedder

Not sure this belongs here or in Hystrix but we need a rate limiter that sheds load when latencies shoot up. The rate limiting operation needs to be performed per operation.

Proposed implementation

Use a modified leaky bucket to limit the total time allotted to the client for accessing a service.

Initialize limiter to an expected historical average latency (L_avg) + request rate (Rate_avg)
Set per second token count T_max = L_avg * Rate_avg
Initialize token count to 0 : T_current = 0
Each operation request acquires L_avg from the limiter: T_current -= L_avg
- If allotment exceeded return tokens to the count and reject operation: T_current += L_avg
Upon completion of operation acquire the excess or deficit difference from the limiter: T_current += (L_actual - L_avg)
Every second (or finer interval) release tokens : T_current -= T_interval

The above algorithm may give too much bias to L_actual should there be an extreme outlier. This can be mitigated using an exponential average of recorded latencies to smooth out occasional outlier. To adapt to changing latency trends the system may use a longer exponential average to adjust L_avg.

Netflix / ocelli

Implement rate limiter / load shedder #44

Proposed implementation