Closed cfunkhouser closed 4 years ago
Merging #39 into master will decrease coverage by
0.33%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #39 +/- ##
==========================================
- Coverage 89.16% 88.83% -0.34%
==========================================
Files 12 12
Lines 572 573 +1
==========================================
- Hits 510 509 -1
- Misses 46 48 +2
Partials 16 16
Impacted Files | Coverage Δ | |
---|---|---|
internal/metrics/api.go | 90.00% <100.00%> (+0.16%) |
:arrow_up: |
internal/metrics/processor.go | 94.44% <0.00%> (-2.78%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 2dd03ad...6e22a7e. Read the comment docs.
Oh hey Funk! Good catch. I'll start figuring out how we want to handle API timeouts internally.
This is partially why we're pushing usage of the DD_FLUSH_TO_LOG
option, wherein the metrics are written to CloudWatch logs ~instantly, and are picked up by the AWS Forwarder lambda: https://github.com/DataDog/datadog-serverless-functions/tree/master/aws/logs_monitoring
For sure! However, in some cases we want to avoid CloudWatch because of the added latency primarily, and additional costs secondarily.
Potentially concerning: in internal/metrics/processor.go
we implement some retry logic on sending metrics. If the Connection timed out and the right flags were set, we might spend up to 15 seconds trying to send metrics. Better than forever, but not ideal. Fortunately shouldRetryOnFail
is set to false
by default. I'll add a few tasks:
What does this PR do?
Set a ten second timeout on the
http.Client
used to communicate with the DataDog API.Motivation
The Go HTTP Client specifies no timeout by default, which means - theoretically - a request could hang indefinitely under unfortunate server conditions. This is especially unfortunate given this library is designed to run in Lambda, where execution time is money.
I chose 10 seconds somewhat arbitrarily. I'm open to other values, or even to making it tunable. My hope is that this PR will start a conversation, more than just being accepted. I have a poor understanding of what the Datadog API latency guarantees are around the globe.
Testing Guidelines
I
go test
ed it. I also used it on a test lambda internally. It works exactly as expected under non-pathological cases.Additional Notes
Types of changes
Checklist