elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.06k stars 4.89k forks source link

[Filebeat][aws-s3] "+" character replaced by a space in the s3 key #33998

Open pdelormekpler opened 1 year ago

pdelormekpler commented 1 year ago

Hi,

I'm testing the aws-s3 input in Filebeat to ship logs from S3 to ElasticSearch.

I'm using filebeat 8.5.2 with this input config:

filebeat.inputs:
  - type: aws-s3
    bucket_arn: arn:aws:s3:::logs-XXX
    bucket_list_prefix: NNN/insight_daily/daily_insight_content/
    access_key_id: ...
    secret_access_key: ...
    number_of_workers: 5
    bucket_list_interval: 60s
    expand_event_list_from_field: Records
    default_region: eu-west-1

Everything starts fine:

  1. Filebeat is able to list the content of the bucket / subfolder
  2. But it cannot find any file later, because the key contains the "+" character which is replaced by a space.

Logs:

"error":{"message":"failed processing S3 event for object key \"NNN/insight_daily/daily_insight_content/2022-04-01
T00:00:00 00:00/1.log\" in bucket \"logs-XXX\": failed to get s3 object (elapsed_time_ns=116900493): s3 GetObject failed
: operation error S3: GetObject, https response error StatusCode: 404, RequestID: HQ9MQ4S7ZGTZ131S, HostID: G+8aU7zzkfzHIUjjTJM5AdKsJKRHKovA81Gi1wOAx3
nmOgE/EFQbkYdl9QVwrgFd2DJQjlIJfk0=, NoSuchKey:

In my s3 bucket, the path is: NNN/insight_daily/daily_insight_content/2022-04-01T00:00:00+00:00/1.log

I saw a related PR merged, but it seems it's not working for "+" at least.

Thank you for your help.

tetianakravchenko commented 1 year ago

@elastic/obs-cloud-monitoring fyi

aspacca commented 1 year ago

@pdelormekpler the s3 object key unescaping on the PR you mentioned is indeed what makes the + sign be translated to a space char.

see this go playground snippet if you want to understand what happens.

there is indeed a bug in the filebeat input, since we should unescape the s3 object key only when the value comes from an s3-sqs notification. in your case, since you are using the direct s3 listing input, the key is not escaped and we should not unescape it.

botelastic[bot] commented 5 months ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!