fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.85k stars 1.58k forks source link

Splunk/Custom HTTP Retry Codes #9371

Closed meehanman closed 1 month ago

meehanman commented 1 month ago

When Splunk is having a bad time, you'll get a 408 - Your request timed out. Try again. but both the out_http and out_splunk plugins will never retry :face_palm:

out_http

if (c->resp.status >= 400 && c->resp.status < 500 && c->resp.status != 429) {
    flb_plg_warn(ctx->ins, "could not flush records to %s:%i (http_do=%i), "
                    "chunk will not be retried",
                    ctx->host, ctx->port, ret);
    out_ret = FLB_ERROR;
}

out_splunk

/*
    * Requests that get 4xx responses from the Splunk HTTP Event
    * Collector will 'always' fail, so there is no point in retrying
    * them:
    *
    * https://docs.splunk.com/Documentation/Splunk/8.0.5/Data/TroubleshootHTTPEventCollector#Possible_error_codes
    */
ret = (c->resp.status < 400 || c->resp.status >= 500) ?
    FLB_RETRY : FLB_ERROR;

Describe the solution you'd like

The Splunk out plugin requires more conformity to the Troubleshooting Documentation that they provide that indicates that 408 and 429 HTTP results should constitute a retry and not return FLB_ERROR.

The HTTP out Plugin would be great to add additional options to Configuring Retries that will allow more configuration to the retry logic such as the addition of additional HTTP codes we would want to retry with eg. if you wanted to retry with HTTP code 666 you could. In theory, we'd just want to add 408 and 429 to the list.

Describe alternatives you've considered

We only really have the option of setting up some intermediary proxy server that would proxy the requests correctly to give it a retryable HTTP response code if there is a 4xx error.

agup006 commented 1 month ago

@cosmo0920 is this something we could add for next sprint?

cosmo0920 commented 1 month ago

Hi, I sent PRs to extend retrying handlings for HTTP status codes for 408 and 429.

cosmo0920 commented 1 month ago

This could be marked as done. Due to two PRs are merged into master.