elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
112 stars 4.93k forks source link

Unable to set max_bytes in filebeats #39753

Open pvcasillasg opened 6 months ago

pvcasillasg commented 6 months ago

For confirmed bugs, please report:

filebeat.yaml

filebeat.inputs:
  - type: filestream
    id: xml-oscap
    enabled: true
    encoding: utf-8
    #message_max_bytes: 20971520
    max_bytes: 20971520
    paths:
      - /home/ansible/ansible_openscap/oscap-reports/*.xml
    parsers:
      - multiline:
          type: pattern
          pattern: '^<\?xml*'
          #flush_pattern: '^[\S]*<\/Benchmark>'
          negate: true
          match: after
          max_lines: 1000000000
      #     max_bytes: 20971520
      #   close_eof: true
# ============================== Filebeat modules ==============================
logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

# ------------------------------ Logstash Output -------------------------------
output.logstash:
  hosts: ["192.168.1.220:5044"]
  bulk_max_size: 20971520

# ================================= Processors =================================
processors:
  #- add_host_metadata:
  #    when.not.contains.tags: forwarded
  #- add_cloud_metadata: ~
  #- add_docker_metadata: ~
  #- add_kubernetes_metadata: ~
  #- decode_xml:
  #    field: message
  #    target_field: TestResult
  #    to_lower: true

Error:

2024-05-28 16:54:38.965305489 +0000 UTC m=+6.335550826 write error: data size (11554286 bytes) is greater than the max file size (10485760 bytes)

It seems to filebeat not setting the max_bytes parameter I configured in filebeat.yaml

elasticmachine commented 6 months ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

pierrehilbert commented 6 months ago

I will let @belimawr keeping me honest here but max_bytes is only for log input and message_max_bytes is the way to go with Filestream

pvcasillasg commented 6 months ago

I will let @belimawr keeping me honest here but max_bytes is only for log input and message_max_bytes is the way to go with Filestream

Yeah, I know, but I was desperate and I tried a lot of things, like combinations like those.

Finally I manage to make it work, with message_max_bytes, that I have also tried it before.

But I still get the same error prompt write error: data size (11554286 bytes) is greater than the max file size (10485760 bytes)

belimawr commented 6 months ago

Yes, for the Filestream input, you need to use message_max_bytes as stated in our documentation.

@pvcasillasg, could you post here the whole log file at debug level?

Just to clarify your case: you're trying to ingest a file that is 10Mb+ as a single event with Filebeat and using Logstash as output, is that correct?

pvcasillasg commented 6 months ago

Yes, for the Filestream input, you need to use message_max_bytes as stated in our documentation.

@pvcasillasg, could you post here the whole log file at debug level?

Just to clarify your case: you're trying to ingest a file that is 10Mb+ as a single event with Filebeat and using Logstash as output, is that correct?

Yes, that's exactly the case, and i have the message_max_bytes setup correctly in my configuration file, i will upload the log within a few days, since im out of home

pvcasillasg commented 6 months ago

Yes, for the Filestream input, you need to use message_max_bytes as stated in our documentation.

@pvcasillasg, could you post here the whole log file at debug level?

Just to clarify your case: you're trying to ingest a file that is 10Mb+ as a single event with Filebeat and using Logstash as output, is that correct?

root@docker:/var/lib/filebeat/registry/filebeat# filebeat -c /etc/filebeat/filebeat.yml
2024-06-03 16:17:14.301447465 +0000 UTC m=+21.541617428 write error: data size (11491365 bytes) is greater than the max file size (10485760 bytes)
2024-06-03 16:17:14.312396991 +0000 UTC m=+21.552566934 write error: data size (11554385 bytes) is greater than the max file size (10485760 bytes)

filebeat-20240603.log

Attached the log for the filebeat run

pvcasillasg commented 6 months ago

Also, after cleaning tests in my filebeat.yml, I found that if I don't set max_bytes, the output file stream keeps sending incomplete

The working config for filebeat in my case is:

filebeat.inputs:
  - type: filestream
    id: xml-oscap_pre
    enabled: true
    encoding: utf-8
    message_max_bytes: 52428800
    max_bytes: 52428800
    paths:
      - /home/ansible/ansible_openscap/reports/pre_reports/*.xml
    parsers:
      - multiline:
          type: pattern
          pattern: '^<\?xml*'
          negate: true
          match: after
          max_lines: 30000000
          timeout: 20s

Without max_bytes everything crashes

belimawr commented 6 months ago

@pvcasillasg could you also post your output configuration? Redact all sensitive information like credentials, Domains, IPs, etc.

pvcasillasg commented 6 months ago

@pvcasillasg could you also post your output configuration? Redact all sensitive information like credentials, Domains, IPs, etc.

Here it's the full yaml file, as I said before, if I comment or delete de max_bytes lane, it stop working.

filebeat.inputs:
  - type: filestream
    id: xml-oscap
    enabled: true
    encoding: utf-8
    message_max_bytes: 52428800
    max_bytes: 52428800
    paths:
      - $PATH
    parsers:
      - multiline:
          type: pattern
          pattern: '^<\?xml*'
          # flush_pattern: '^[\S]*<\/Benchmark>'
          negate: true
          match: after
          max_lines: 3000000
          timeout: 20s
logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: "0644"

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.template.settings:
  index.number_of_shards: 1

setup.kibana:

output.logstash:
  hosts: ["192.168.1.220:5044"]
  bulk_max_size: 4096

processors:
  # - add_host_metadata:
  #     when.not.contains.tags: forwarded
  # - add_cloud_metadata: ~
  # - add_docker_metadata: ~
  # - add_kubernetes_metadata: ~
  # - decode_xml:
  #    field: message
  #    target_field: TestResult
  #    to_lower: true
VihasMakwana commented 6 months ago

@pvcasillasg I think the error is arising because your have logging.level set to debug and it actually logs the entire event (20 MB XML in your case). logging has a maximum permissible limit of 10MB by default per file.

You can fix it via two options:

  1. set logging.files > rotateeverybytes to more than 20MB (maybe 21MB, as the log event has some other information as well)
  2. set logging.level to info
pvcasillasg commented 5 months ago

@pvcasillasg I think the error is arising because your have logging.level set to debug and it actually logs the entire event (20 MB XML in your case). logging has a maximum permissible limit of 10MB by default per file.

You can fix it via two options:

  1. set logging.files > rotateeverybytes to more than 20MB (maybe 21MB, as the log event has some other information as well)
  2. set logging.level to info

Huh, I think nope.

I only enable the debug log level in order to attach them for this issue. Without configuring any log level im still getting the same error prompt.

VihasMakwana commented 5 months ago

@pvcasillasg Okay, understood. Can you still update logging.files.rotateeverybytes to a bigger value and try to run it and see you face any error?

In my case, this issue was reproducible and increasing logging.files.rotateeverybytes fixed it.

VihasMakwana commented 4 months ago

@pvcasillasg Hi! Just checking in if the workaround worked for you?