Add a CircularBuffer in AgentSink

jaredcnance commented 4 years ago

Description

Currently, if the agent is down or has not started, metrics can be dropped. It's currently up to the caller of logger.flush to handle retries. There are 2 options:

Backpressure the caller of logger.flush. This could negatively impact request latencies.
On error, enqueue to a circular buffer. The trick here is we will need to retry this queue on an interval which changes the model from an async/await to a purely async one. This is a departure from the current design and will need to be turned on via feature flag.

The symptoms of this are:

The first metrics during initialization of the app may not appear
The following error message will be in your app logs:

(node:1) UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 172.17.0.2:25888
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1106:14)

Tasks

Add type AgentSinkOptions with
- RetryStrategy parameter where the default value is None for backwards compatibility with a single option to start with: ExponentialBackoffRetryStrategy (see also: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/)
- AsyncBehavior parameter that controls whether the call should block or not. In the former case we keep the current behavior and in the latter we return immediately, enqueuing to the retry buffer on failure.
Change AgentSink's constructor to constructor(options: AgentSinkOptions, ISerializer: serializer).
Add RetryStrategies which the AgentSink uses based on its configuration. NoRetry propagates errors back to the caller of flush which maintains current behavior today. ExponentialRetry (which can be configured by the application) will block flush on the first attempt, enqueuing to a CircularBuffer (whose size is also configurable) on failures.
On startup, setInterval will be set to check the size of the CircularBuffer and retry failed requests asynchronously.
Add shutdown method to gracefully shutdown and block on any outstanding requests.

Example Usage

AWS_EMF_AGENT_RETRY_STRATEGY="ExponentialBackoff"
// or
Configuration.agentRetryStrategy = RetryStrategy.ExponentialBackoff;
// or 
Configuration.agentRetryStrategy = (...) => customRetryStratgy();

// ...
await logger.flush();
// execution control is returned when logs have been successfully flushed or enqueued for retry

Open Question

Should we change logger.flush() to enqueue and return immediately? This would allow us to make flush() a synchronous operation in all cases.

davidtheclark commented 4 years ago

@jaredcnance is there any likelihood that this feature will be released in the near future?

jaredcnance commented 4 years ago

@davidtheclark the current priority for this repo is #54 and then this issue. The earliest I could probably complete this issue would be ~5-6 weeks from now.

heldersepu commented 1 year ago

Any updates on this?

awslabs / aws-embedded-metrics-node