elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.14k stars 4.91k forks source link

[Filebeat] httpjson - request tracer fails with long URL #35157

Open andrewkroh opened 1 year ago

andrewkroh commented 1 year ago

When the request tracer feature with input ID substitution the ID can be come longer than the maximum allowed file name.

An input id can be set to any value by a user and for stateful inputs (e.g. has cursor) Filebeat also appends the URL. So this substituted ID value could get really long.

If the filename surpasses the max file name then you end up with no tracer logs and an error coming out of the logger that is written directly to stderr (bypassing the Beat logger).

This is probably a rare edge case, but given that users might not be able to control the URL (and Filebeat forces it into the input ID) I think consideration should be given to guarding against this problem.

Observed error:

2023-04-20 15:55:54.014538 -0400 EDT m=+1.159689793 write error: error getting log file info: stat ../../logs/httpjson/http-request-trace-httpjson-foo-eb837d4c-5ced-45ed-b05c-de658135e248_https_api.io_v1_reporting_issues_?page=1&perPage=10&sortBy=issueTitle&order=asc&groupBy=issue&key=aHR0cC1yZXF1ZXN0LXRyYWNlLWh0dHBqc29uLWZvby1lYjgzN2Q0Yy01Y2VkLTQ1ZWQtYjA1Yy1kZTY1ODEzNWUyNDhfaHR0cHNfYXBpLnNueWsuaW9fdjFfcmVwb3J0aW5nX2lzc3Vlc18_cGFnZT0xJnBlclBhZ2U9MTAmc29ydEJ5PWlzc3VlVGl0bGUmb3JkZXI9YXNjJmdyb3VwQnk9aXNzdWUubmRqc29u.ndjson: file name too long

Config used:

filebeat.inputs:
  - type: httpjson
    id: httpjson-foo-eb837d4c-5ced-45ed-b05c-de658135e248
    config_version: 2
    publisher_pipeline.disable_host: true
    interval: 1m
    request.url: https://api.io/v1/reporting/issues/?page=1&perPage=10&sortBy=issueTitle&order=asc&groupBy=issue&key=aHR0cC1yZXF1ZXN0LXRyYWNlLWh0dHBqc29uLWZvby1lYjgzN2Q0Yy01Y2VkLTQ1ZWQtYjA1Yy1kZTY1ODEzNWUyNDhfaHR0cHNfYXBpLnNueWsuaW9fdjFfcmVwb3J0aW5nX2lzc3Vlc18/cGFnZT0xJnBlclBhZ2U9MTAmc29ydEJ5PWlzc3VlVGl0bGUmb3JkZXI9YXNjJmdyb3VwQnk9aXNzdWUubmRqc29u
    response.decode_as: application/json
    cursor:
      last_cursor:
        value: '[[.last_response.body]]'
    request.tracer.filename: ../../logs/httpjson/http-request-trace-*.ndjson

output.console.pretty: true

For confirmed bugs, please report:

Related:

elasticmachine commented 1 year ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

andrewkroh commented 1 year ago

This is not a full solution to the problem but it does avoid the URL being included (like in the case of Fleet). It reads the id from the config rather than getting from the input v2 runner.

diff --git a/x-pack/filebeat/input/httpjson/config.go b/x-pack/filebeat/input/httpjson/config.go
index 74043594a6..4d0cf6c4f8 100644
--- a/x-pack/filebeat/input/httpjson/config.go
+++ b/x-pack/filebeat/input/httpjson/config.go
@@ -14,6 +14,7 @@ import (
 )

 type config struct {
+   ID       string          `config:"id"`
    Interval time.Duration   `config:"interval" validate:"required"`
    Auth     *authConfig     `config:"auth"`
    Request  *requestConfig  `config:"request" validate:"required"`
diff --git a/x-pack/filebeat/input/httpjson/input.go b/x-pack/filebeat/input/httpjson/input.go
index 6e1d3e8ca3..5634a9ed9a 100644
--- a/x-pack/filebeat/input/httpjson/input.go
+++ b/x-pack/filebeat/input/httpjson/input.go
@@ -114,7 +114,12 @@ func run(
    stdCtx := ctxtool.FromCanceller(ctx.Cancelation)

    if config.Request.Tracer != nil {
-       id := sanitizeFileName(ctx.ID)
+       id := ctx.ID
+       if config.ID != "" {
+           // If the user explicitly configured an ID use it.
+           id = config.ID
+       }
+       id = sanitizeFileName(id)
        config.Request.Tracer.Filename = strings.ReplaceAll(config.Request.Tracer.Filename, "*", id)
    }
elasticmachine commented 7 months ago

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

efd6 commented 3 months ago

It's worse than this. I have seen recently in a support case where, due to input ID elaboration, ends up with a base path that is short enough to be written into the zip, but then too long to be able to be extracted without significant effort.