DataDog / serilog-sinks-datadog-logs

Serilog Sink that sends log events to Datadog https://www.datadoghq.com/
Apache License 2.0
60 stars 42 forks source link

Let SeriLog handle retries #45

Closed IrisClasson closed 3 years ago

IrisClasson commented 4 years ago

Hello again and thanks for the work on the sink! We are quite keen on using this sink at work but have noticed a few things that brought up some questions/concerns.

Is there any particular reason why this sink handles the retries instead of letting Serilog manage them? There seems to be some side effects that can be avoided.

For example, CloseAndFlush on the logger doesn't seem to work which means the logs are lost if the application closes down before the batch is sent. We are assuming this is a side-effect of hijacking the retries.

A second problem is that the exceptions from bad requests are currently being swallowed. By letting Serilog do the retry the exception bubbles up, and is outputted to the SelfLog. This makes debugging easier, and avoids cluttering of the SelfLog bu adding manual entries. That seems to be the recommended way to do it.

I'll push a PR with an example.

ogaca-dd commented 4 years ago

@IrisClasson Thank you for reporting this issue.

Is there any particular reason why this sink handles the retries instead of letting Serilog manage them? There seems to be some side effects that can be avoided.

A second problem is that the exceptions from bad requests are currently being swallowed. By letting Serilog do the retry the exception bubbles up, and is outputted to the SelfLog. This makes debugging easier, and avoids cluttering of the SelfLog bu adding manual entries. That seems to be the recommended way to do it.

I need to gather some information to answer to these two questions.

For example, CloseAndFlush on the logger doesn't seem to work which means the logs are lost if the application closes down before the batch is sent. We are assuming this is a side-effect of hijacking the retries.

Can you explain in which case CloseAndFlush is not working?

Consider the following code:

            Log.Logger = new LoggerConfiguration()
                .WriteTo.DatadogLogs("API_KEY")
                .CreateLogger();

            Log.Information("Test serilog");
            Log.CloseAndFlush(); // This is line 60

I set a breakpoint at https://github.com/DataDog/serilog-sinks-datadog-logs/blob/ebe50a2205e1fe334416e0d33ea54c6987a1aac3/src/Serilog.Sinks.Datadog.Logs/Sinks/Datadog/DatadogHttpClient.cs#L108

Screen Shot 2020-08-21 at 11 29 28 AM

As you can see, the log is flushed when calling Log.CloseAndFlush(); which is located at line 60.

IrisClasson commented 4 years ago

Thanks for replying and investigating!

CloseAndFlush() wasn't consistently sending the entries, but when trying to reproduce this today I was unable to do so.

Did you have a chance to investigate why the sink doesn't let SeriLog handle the retries? If you make the request fail and enable SelfLog you won't see any exceptions outputted.

ogaca-dd commented 4 years ago

@IrisClasson ,

Our sink calls SelfLog when it cannot send a message. See https://github.com/DataDog/serilog-sinks-datadog-logs/blob/ebe50a2205e1fe334416e0d33ea54c6987a1aac3/src/Serilog.Sinks.Datadog.Logs/Sinks/Datadog/DatadogHttpClient.cs#L119

Consider the following code:

static void Main(string[] args)
{
    Serilog.Debugging.SelfLog.Enable(msg => Console.WriteLine(msg));
    Log.Logger = new LoggerConfiguration()
        .WriteTo.DatadogLogs(API_KEY, configuration: new DatadogConfiguration{ Url = "InvalidHost"})
        .CreateLogger();

    Log.Information("Test serilog");
    Log.CloseAndFlush();
}

It displays

2020-09-10T10:11:08.6100920Z Could not send payload to Datadog: [{"Timestamp":"2020-09-10T12:08:08.445887+02:00","level":"Information","MessageTemplate":"Test serilog","message":"Test serilog","ddsource":"csharp"}]

As MaxBackoff is 30 and MaxRetries is 10, it takes around 300 seconds to display an error.

Is your issue the formating of the message? https://github.com/DataDog/serilog-sinks-datadog-logs/blob/ebe50a2205e1fe334416e0d33ea54c6987a1aac3/src/Serilog.Sinks.Datadog.Logs/Sinks/Datadog/DatadogHttpClient.cs#L119

Do you want to have a way to set MaxRetries to see an error message faster?

DatadogSink implements PeriodicBatchingSink which handles the retries at https://github.com/serilog/serilog-sinks-periodicbatching/blob/313598fd6c0747489c0fdf3949238dd75e60dae0/src/Serilog.Sinks.PeriodicBatching/Sinks/PeriodicBatching/PeriodicBatchingSink.cs#L209-L258. PeriodicBatchingSink considers a batch (IEnumerable<LogEvent>) as sent or not sent. For a single batch, DatadogSink can sent several HTTP requests, some of them can succeed and others can failed. Throwing an exception in this case lead to retry the full batch (IEnumerable<LogEvent>) and so sending several times the same logs. As the retry logic of PeriodicBatchingSink cannot be configured, is there a reason to use it instead of the one in DatadogSink?

IrisClasson commented 3 years ago

Thank you for the reply @ogaca-dd . I saw the recent merge and that fixes the exception handling, which solves the problem for us. I'll close this issue.

Appreciate the help!