elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
103 stars 4.92k forks source link

Issues on duplicates occuring on both 8.14.1 and on 8.14.3 but not on 8.12.2 #40808

Open ZerNox opened 2 months ago

ZerNox commented 2 months ago

We have upgraded from 8.12.2 to first 8.14.1 and then to 8.14.3 but on both 8.14.1 and 8.14.3 we have issues with filebeat, that we think occurs in the filestream reader. No duplications on 8.12.2 if we revert the version and run the same load test.

Is there anything obvious that has changed in terms of the configuration between the versions that i have missed, or is there something that we should modify in our configuration?

For confirmed bugs, please report:

setup.ilm.enabled: false
setup.template.overwrite: true
#logging.level: debug
filebeat.inputs:
  - type: filestream
    paths:
      - "/application/logs/*.log"
      - "/application/logs/archive/*.log-*"
    json.keys_under_root: true
    json.overwrite_keys: true
    prospector.scanner.exclude_files: ["nlog-internal-error.log","^.log"]
    encoding: utf-8
    rotation.external.strategy.copytruncate:
      suffix_regex: \-\d{6}$\.\d$
      dateformat: -2006-01-02
processors:
  - decode_json_fields:
      fields: ["message"]
      process_array: false
      max_depth: 2
      target: ""
      overwrite_keys: true
      add_error_key: false
  - timestamp:
      field: timestamp
      layouts:
        - '2006-01-02T15:04:05.999999Z'
        - '2006-01-02T15:04:05Z'
        - '2006-01-02T15:04:05.999Z'
      test:
        - '2022-04-06T03:13:35.654269Z'
  - drop_fields:
      fields: ["timestamp"]
  - rename:
      when:
        has_fields: ['labels.MetricName']
      fields:
      - from: "labels.MetricName"
        to: "metric-name"
      - from: "labels.MetricValue"
        to: "metric-value"
      ignore_missing: true
      fail_on_error: true
botelastic[bot] commented 2 months ago

This issue doesn't have a Team:<team> label.

rlueckl commented 1 month ago

We're experiencing the same issue since upgrading to 8.14.x. I've narrowed it down to the problem being introduced in 8.14.0. The duplicates did not happen on 8.13.4. I suspect #36761 and #38488 to be the cause, but I couldn't confirm it yet.

The duplicates happen for us if we have a logfile which isn't updated within the close_inactive timeout. By using the settings from Inode reuse causes Filebeat to skip lines we could work around the issue and it's not happening anymore, even with 8.14.x

Our default config which causes duplicates on 8.14.x:

---
- type: filestream
  id: cassandra_system
  paths:
  - /var/log/cassandra/system.log
  encoding: plain
  document_type: log
  prospector:
     scanner:
        check_interval: 10s
  harvester_buffer_size: 16384
  message_max_bytes: 10485760
  parsers:
    - multiline:
        pattern: '^[[:space:]]*[A-Z]+[[:space:]]+\['
        negate: true
        match: after
  tail_files: false

  backoff.init: 1s
  backoff.max: 10s
  backoff_factor: 2

  close_inactive: 5m
  close_renamed: false
  close_removed: true
  close_eof: false
  clean_inactive: 0
  clean_removed: true
  close_timeout: 0

  fields:
    type: cassandra

  fields_under_root: true

New config working with 8.14.x:

---
- type: filestream
  id: cassandra_system
  paths:
  - /var/log/cassandra/system.log
  encoding: plain
  ignore_older: 48h
  document_type: log
  prospector:
     scanner:
        check_interval: 10s
  harvester_buffer_size: 16384
  message_max_bytes: 10485760
  parsers:
    - multiline:
        pattern: '^[[:space:]]*[A-Z]+[[:space:]]+\['
        negate: true
        match: after
  tail_files: false

  backoff.init: 1s
  backoff.max: 10s
  backoff_factor: 2

  close_inactive: 10m
  close_renamed: false
  close_removed: true
  close_eof: false
  clean_inactive: 72h
  clean_removed: true
  close_timeout: 0

  fields:
    type: cassandra

  fields_under_root: true

So the differences between the two configs are:

  ignore_older: 48h (vs. not set)
  close_inactive: 10m (vs. 5m)
  clean_inactive: 72h (vs. 0)

Most of these settings are default values coming from pcfens/filebeat and after playing around with the settings only clean_inactive seems to affect the issue. If I set it to anything other than 0, there are no duplicates. If clean_inactive: 0 is set, the duplicates start coming.

Some more investigation revealed that setting clean_inactive: 0 in 8.14.x has the effect of nulling out the cursor and meta fields in the registry file (/var/lib/filebeat/filebeat/log.json):

clean_inactive: 0 in 8.13.x:

{"op":"set","id":65254}
{"k":"filestream::cassandra_system::native::262312-2051","v":{"ttl":1800000000000,"updated":[516033235298,1726832301],"cursor":{"offset":2052210},"meta":{"source":"/var/log/cassandra/system.log","identifier_name":"native"}}}
{"op":"set","id":65255}
{"k":"filestream::cassandra_system::native::262312-2051","v":{"ttl":1800000000000,"updated":[516035577754,1726832336],"cursor":{"offset":2052383},"meta":{"source":"/var/log/cassandra/system.log","identifier_name":"native"}}}

clean_inactive: 0 in 8.14.x:

{"op":"set","id":65260}
{"k":"filestream::cassandra_system::native::262312-2051","v":{"cursor":null,"meta":null,"ttl":0,"updated":[515468346993,1726834065]}}
{"op":"set","id":65261}
{"k":"filestream::cassandra_system::native::262312-2051","v":{"ttl":0,"updated":[515468400457,1726834965],"cursor":null,"meta":null}}

Shouldn't the cursor.offset (and meta) have a value, even if clean_inactive is zero?

ZerNox commented 1 month ago

@rlueckl 🙏 we will try this aswell. Have anyone considered moving to fingerprint to avoid these inode issues ? And is the performance hit still is Valid https://www.elastic.co/blog/introducing-filestream-fingerprint-mode

rlueckl commented 1 month ago

Thanks for the link, I didn't know about fingerprint mode.

In our particular case there are no performance issues to consider since we're only collecting one file per node (/var/log/cassandra/system.log) and the <inode>-<device_id> looks stable from what I can tell from log.json. There's a new "inode" every day when the file gets rotated and the "device_id" is unchanged. So the <inode>-<device_id> stays the same during the day and is unique every day.

ZerNox commented 1 month ago

Is this issue confirmed from the project aswell ?

neeej commented 1 month ago

I'm seeing similar issue with duplicate log entries as well since moving to 8.14.3. Did not have this issue in 8.12.2.

I run on Amazon Linux 2.

I have this config below with fingerprint configured, as we had issues before using filestream with Java SpringBoot logs, and inode reusage.

- type: filestream
  id: json-logs
  tags: ["filebeat", "json"]
  encoding: utf-8
  ignore_older: 30m

  paths:
    - "/var/opt/logs/*/*-json.log*"

  file_identity.fingerprint: ~
  prospector.scanner:
    exclude_files: ['\.gz$']
    fingerprint:
      enabled: true
      offset: 0
      length: 1024
neeej commented 1 month ago

This should be fixed in 8.15.0 according to https://github.com/elastic/beats/pull/40258