elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.16k stars 4.91k forks source link

Integer overflow in Beats metrics #5433

Closed kvch closed 4 years ago

kvch commented 6 years ago

Version: 6.0.0-rc2 Operating System: Debian GNU/Linux 9 Beat: Filebeat

Configuration

#=========================== Filebeat prospectors =============================

filebeat.prospectors:
- type: stdin
  enabled: true

#------------------------------- Kafka output ----------------------------------
output.kafka:
  # Boolean flag to enable or disable the output module.
  enabled: true

  # The list of Kafka broker addresses from where to fetch the cluster metadata.
  # The cluster metadata contain the actual Kafka brokers events are published
  # to.
  hosts: ["localhost:9092"]

  # The Kafka topic used for produced events. The setting can be a format string
  # using any event field. To set the topic from document type use `%{[type]}`.
  topic: beats

  # The Kafka event partitioning strategy. Default hashing strategy is `hash`
  # using the `output.kafka.key` setting or randomly distributes events if
  # `output.kafka.key` is not configured.
  partition.hash:
    # If enabled, events will only be published to partitions with reachable
    # leaders. Default is false.
    reachable_only: true

Failure

After sending all events to Kafka:

2017/10/24 20:42:46.204889 log.go:99: DBG Source is not continuable: /dev/stdin
2017/10/24 20:42:46.204899 harvester.go:230: INFO End of file reached: . Closing because close_eof is enabled.
2017/10/24 20:42:46.204910 harvester.go:453: DBG Stopping harvester for file:
2017/10/24 20:42:46.204928 harvester.go:463: DBG Closing file:
2017/10/24 20:42:46.204934 harvester.go:474: DBG harvester cleanup finished for file:
2017/10/24 20:42:46.204952 logger.go:18: DBG client: closing acker
2017/10/24 20:42:46.204960 logger.go:18: DBG client: done closing acker
2017/10/24 20:42:46.204966 logger.go:29: DBG client: cancelled 0 events
2017/10/24 20:42:46.876412 client.go:220: DBG finished kafka batch
2017/10/24 20:42:46.876487 logger.go:29: DBG broker ACK events: count=615, start-seq=280240, end-seq=280854
2017/10/24 20:42:46.876507 logger.go:18: DBG ackloop: return ack to broker loop:615
2017/10/24 20:42:46.876518 logger.go:18: DBG ackloop:  done send ack

The metric libbeat.pipeline.events.active is a negative number in logs. But according to the code in libbeat/publisher/pipeline/monitoring.go it is a 64 bit unsigned integer. So either it is not shown correctly by logp or it is overflown. Seeing a negative value might be confusing for users, too.

2017/10/24 20:43:14.501965 metrics.go:39: INFO Non-zero metrics in the last 30s:
beat.memstats.gc_next=1643024
beat.memstats.memory_alloc=-221680
beat.memstats.memory_total=39758080
filebeat.events.active=2762
filebeat.events.added=2762
filebeat.harvester.closed=1
filebeat.harvester.open_files=-1
filebeat.harvester.running=-1
libbeat.output.events.acked=3933
libbeat.output.events.batches=3
libbeat.output.events.total=3933
libbeat.outputs.kafka.bytes_read=629
libbeat.outputs.kafka.bytes_write=58564
libbeat.pipeline.clients=-1
libbeat.pipeline.events.active=-1171  # negative, but it is a uint64
libbeat.pipeline.events.published=2763
libbeat.pipeline.events.total=2762
libbeat.pipeline.queue.acked=3933

During shutdown everything seems normal.

2017/10/24 20:44:42.349254 metrics.go:51: INFO Total non-zero values: 
beat.memstats.gc_next=5837328
beat.memstats.memory_alloc=3066776
beat.memstats.memory_total=3192738768
filebeat.events.active=280854
filebeat.events.added=280854
filebeat.harvester.closed=1
filebeat.harvester.open_files=-1
filebeat.harvester.started=1
libbeat.config.reloads=1
libbeat.output.events.acked=280854
libbeat.output.events.batches=179
libbeat.output.events.total=280854
libbeat.output.type=kafka
libbeat.outputs.kafka.bytes_read=38554
libbeat.outputs.kafka.bytes_write=4101451
libbeat.pipeline.events.published=280854
libbeat.pipeline.events.retry=1680
libbeat.pipeline.events.total=280854
libbeat.pipeline.queue.acked=280854
registrar.states.current=8
registrar.writes=1
tsg commented 6 years ago

We've discussed this in the meeting today and decided to go for a two phases solution:

kvch commented 4 years ago

Still not resolved 100%, but tracked in a meta issue, so closing.