DataDog / datadog-lambda-extension

Datadog Lambda Extension
Apache License 2.0
81 stars 5 forks source link

Fail on send telemetry on "warm start" and "shutdown" after timeout #195

Open Vladikamira opened 8 months ago

Vladikamira commented 8 months ago

The context: datadog-lambda-extension version 44 Lambda and DD site is located in the EU.

the problem: It looks like dd-extension starts telemetry flush and does not stop before each invocation ends when Lambda enters the IDLE state. We see that by Warnings and Errors messages when the function gets invoked either after the warm start on subsequent invocation or the shutdown event. Error/Warning pops up when more than WaitTimeout (5 seconds) has passed after the previous invocation (I didn't dive deep into the code, but that looks like the proper threshold).

Example of the WARN:

DD_EXTENSION | WARN | Could not send payload: Post "https://http-intake.logs.datadoghq.eu/api/v2/logs": context deadline exceeded (Client. Timeout exceeded while awaiting headers)

example of the ERROR:

 | UTC | DD_EXTENSION | ERROR | Exporting failed. No more retries left. Dropping data.

Here are some examples in screenshots:

WARN on subsequent invocation:

WARN on Invocation

WARN (with debug) on invocation

WARN on shutdown event: WARN on shutdown

WARN on shutdown with debug

ERROR on shutdown (this one seems to be from the OTLP part of the agent):

ERROR and shutdown event

Vladikamira commented 8 months ago

problem reproduced on the datadog-lambda-extension version 53 (cannot check 55 as the image is not published)

zARODz11z commented 8 months ago

Hi @Vladikamira i'm able to find v55's tag here https://hub.docker.com/r/datadog/lambda-extension/tags. May you clarify if we are on the same page about

cannot check 55 as the image is not published

Vladikamira commented 8 months ago

Hi @Vladikamira i'm able to find v55's tag here https://hub.docker.com/r/datadog/lambda-extension/tags. May you clarify if we are on the same page about

cannot check 55 as the image is not published

indeed, I will try that one from DockerHub, thanks! πŸ‘ We are using the AWS one: https://gallery.ecr.aws/datadog/lambda-extension, the last one there is 53

Vladikamira commented 8 months ago

version 55 has the same problem

Screenshot 2024-02-23 at 10 41 06

hghotra commented 2 months ago

Hi @Vladikamira, can you try the latest version of the extension and report back? This was fixed in a recent version of the extension.

DylanLovesCoffee commented 2 months ago

Upgrading to v59+ should include a change that improves the flushing logic for sending logs. Could we give that a try and then keep us updated?

Vladikamira commented 2 months ago

Thanks! yup, I cannot reproduce the issue anymore on the version v64 πŸŽ‰

but I got new WARN messages though πŸ˜…

2024-09-11 15:09:38 UTC | DD_EXTENSION | WARN | config key flare_stripped_keys is unknown
2024-09-11 15:09:38 UTC | DD_EXTENSION | WARN | failed to get configuration value for key "flare_stripped_keys": unable to cast <nil> of type <nil> to []string
2024-09-11 15:09:38 UTC | DD_EXTENSION | WARN | config key scrubber.additional_keys is unknown
2024-09-11 15:09:38 UTC | DD_EXTENSION | WARN | failed to get configuration value for key "scrubber.additional_keys": unable to cast <nil> of type <nil> to []string
Vladikamira commented 2 months ago

anyway, this issue is fixed, therefore I'm closing PR, thanks! πŸ™ I'Il open another one for a new issue πŸ˜„

Vladikamira commented 2 months ago

sorry, we do not have these massages anymore after the upgrade to v64

DD_EXTENSION | WARN | Could not send payload: Post "https://http-intake.logs.datadoghq.eu/api/v2/logs": EOF (Client.Timeout exceeded while awaiting headers)

but we still have these

 DD_EXTENSION | WARN | SyncForwarder.sendHTTPTransactions failed to send: error while sending transaction, rescheduling it: Post "https://7-55-3-app.agent.datadoghq.eu/api/v1/series": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Screenshot 2024-09-12 at 14 00 09