cloudfoundry-attic / datadog-firehose-nozzle

OBSOLETE CF component to forward metrics from the Loggregator Firehose to DataDog
Apache License 2.0
9 stars 29 forks source link

deal with high network latency #10

Closed funcode closed 8 years ago

funcode commented 8 years ago

We are sending metrics to Datadog from AWS China, which has high network latency to the datadog api endpoint. It seems that the datadog-firehose-nozzle doesn't perform well and Datadog loses metrics. We believe that it is not a Datadog's issue as that we got pretty good results from JMX Bridge which sends metrics using Datadog client's JMX integration. Here are some logs:

2016/06/27 07:24:53 Targeting datadog API URL: https://app.datadoghq.com/api/v1/series
2016/06/27 07:24:53 Starting DataDog Firehose Nozzle...
2016/06/27 07:25:08 Posting 658 metrics
2016/06/27 07:25:21 FATAL ERROR: datadog request returned HTTP response: 400 Bad Request

2016/06/27 07:25:24 Targeting datadog API URL: https://app.datadoghq.com/api/v1/series
2016/06/27 07:25:24 Starting DataDog Firehose Nozzle...
2016/06/27 07:25:39 Posting 678 metrics
2016/06/27 07:25:54 Posting 639 metrics
2016/06/27 07:26:09 Posting 668 metrics
2016/06/27 07:26:29 FATAL ERROR: datadog request returned HTTP response: 400 Bad Request

2016/06/27 07:26:35 Targeting datadog API URL: https://app.datadoghq.com/api/v1/series
2016/06/27 07:26:35 Starting DataDog Firehose Nozzle...
2016/06/27 07:26:50 Posting 660 metrics
2016/06/27 07:27:04 FATAL ERROR: datadog request returned HTTP response: 400 Bad Request

2016/06/27 07:27:06 Targeting datadog API URL: https://app.datadoghq.com/api/v1/series
2016/06/27 07:27:06 Starting DataDog Firehose Nozzle...
2016/06/27 07:27:21 Posting 699 metrics
2016/06/27 07:27:36 Posting 654 metrics
2016/06/27 07:27:49 FATAL ERROR: datadog request returned HTTP response: 400 Bad Request

2016/06/27 07:27:57 Targeting datadog API URL: https://app.datadoghq.com/api/v1/series
2016/06/27 07:27:57 Starting DataDog Firehose Nozzle...
2016/06/27 07:28:12 Posting 719 metrics
2016/06/27 07:28:23 FATAL ERROR: Post https://app.datadoghq.com/api/v1/series?api_key=xxx: net/http: TLS handshake timeout
cf-gitbot commented 8 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/126110907

The labels on this github issue will be updated when the story is started.

wfernandes commented 8 years ago

@funcode Apologies for the late response. I'll discuss this with our PM for the need of prioritizing this story.

poy commented 8 years ago

The option NOZZLE_DATADOGTIMEOUTSECONDS and DataDogTimeoutSeconds was added to set a timeout for the Post to DataDog. This way the nozzle will timeout rather than freeze on a high latency Post.