elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.18k stars 3.5k forks source link

Node Stats API gives wrong values during persistent queue draining #13832

Closed RemiRigal closed 2 years ago

RemiRigal commented 2 years ago

Logstash information:

Please include the following information:

  1. Logstash version: 7.12.0
  2. Logstash installation source: docker image logstash/logstash-oss
  3. How is Logstash being run: OpenShift/docker

Plugins installed: logstash-codec-avro (3.2.4) logstash-codec-cef (6.1.1) logstash-codec-collectd (3.0.8) logstash-codec-dots (3.0.6) logstash-codec-edn (3.0.6) logstash-codec-edn_lines (3.0.6) logstash-codec-es_bulk (3.0.8) logstash-codec-fluent (3.3.0) logstash-codec-graphite (3.0.5) logstash-codec-json (3.0.5) logstash-codec-json_lines (3.0.6) logstash-codec-line (3.0.8) logstash-codec-msgpack (3.0.7) logstash-codec-multiline (3.0.10) logstash-codec-netflow (4.2.1) logstash-codec-plain (3.0.6) logstash-codec-rubydebug (3.1.0) logstash-filter-aggregate (2.9.1) logstash-filter-anonymize (3.0.6) logstash-filter-cidr (3.1.3) logstash-filter-clone (4.0.0) logstash-filter-csv (3.0.10) logstash-filter-date (3.1.9) logstash-filter-de_dot (1.0.4) logstash-filter-dissect (1.2.0) logstash-filter-dns (3.1.4) logstash-filter-drop (3.0.5) logstash-filter-elasticsearch (3.9.3) logstash-filter-fingerprint (3.2.2) logstash-filter-geoip (6.0.5) logstash-filter-grok (4.4.0) logstash-filter-http (1.0.2) logstash-filter-json (3.1.0) logstash-filter-kv (4.4.1) logstash-filter-memcached (1.1.0) logstash-filter-metrics (4.0.7) logstash-filter-mutate (3.5.0) logstash-filter-prune (3.0.4) logstash-filter-ruby (3.1.5) logstash-filter-sleep (3.0.7) logstash-filter-split (3.1.8) logstash-filter-syslog_pri (3.0.5) logstash-filter-throttle (4.0.4) logstash-filter-translate (3.2.3) logstash-filter-truncate (1.0.4) logstash-filter-urldecode (3.0.6) logstash-filter-useragent (3.2.4) logstash-filter-uuid (3.0.5) logstash-filter-xml (4.1.1) logstash-input-azure_event_hubs (1.2.3) logstash-input-beats (6.1.0) logstash-input-couchdb_changes (3.1.6) logstash-input-dead_letter_queue (1.1.5) logstash-input-elasticsearch (4.9.1) logstash-input-exec (3.3.3) logstash-input-file (4.2.3) logstash-input-ganglia (3.1.4) logstash-input-gelf (3.3.0) logstash-input-generator (3.0.6) logstash-input-graphite (3.0.6) logstash-input-heartbeat (3.0.7) logstash-input-http (3.3.7) logstash-input-http_poller (5.0.2) logstash-input-imap (3.1.0) logstash-input-jms (3.1.2) logstash-input-pipe (3.0.7) logstash-input-redis (3.6.0) logstash-input-s3 (3.5.0) logstash-input-snmp (1.2.7) logstash-input-snmptrap (3.0.6) logstash-input-sqs (3.1.3) logstash-input-stdin (3.2.6) logstash-input-syslog (3.4.5) logstash-input-tcp (6.0.7) logstash-input-twitter (4.0.3) logstash-input-udp (3.4.0) logstash-input-unix (3.0.7) logstash-integration-jdbc (5.0.6) ├── logstash-input-jdbc ├── logstash-filter-jdbc_streaming └── logstash-filter-jdbc_static logstash-integration-kafka (10.7.1) ├── logstash-input-kafka └── logstash-output-kafka logstash-integration-rabbitmq (7.2.0) ├── logstash-input-rabbitmq └── logstash-output-rabbitmq logstash-output-cloudwatch (3.0.9) logstash-output-csv (3.0.8) logstash-output-elastic_app_search (1.1.1) logstash-output-elasticsearch (10.8.2) logstash-output-email (4.1.1) logstash-output-file (4.3.0) logstash-output-graphite (3.1.6) logstash-output-http (5.2.4) logstash-output-lumberjack (3.1.8) logstash-output-nagios (3.0.6) logstash-output-null (3.0.5) logstash-output-pipe (3.0.6) logstash-output-redis (5.0.0) logstash-output-s3 (4.3.3) logstash-output-sns (4.0.7) logstash-output-sqs (6.0.0) logstash-output-stdout (3.1.4) logstash-output-tcp (6.0.0) logstash-output-udp (3.1.0) logstash-output-webhdfs (3.0.6) logstash-patterns-core (4.3.0)

JVM: OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9) OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)

OS version: Linux hostname 3.10.0-957.12.2.el7.x86_64 #1 SMP Fri Apr 19 21:09:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior: When the setting queue.drain is set to true with PQ enabled, Logstash drains the queue on SIGTERM as expected but the values returned by the Node Stats API are not all updated anymore.

The events stats appears to be the only one receiving updates. As so, the fields events.in, events.out etc... keep being updated but all other fields such as process.cpu.percent, pipeline.queue.capacity.queue_size_in_bytes etc... are stuck at their last value.

Steps to reproduce:

  1. Start a pipeline with the following settings:
    • queue.type: persisted
    • queue.drain: true
  2. Populate the queue of the pipeline with a lot of inputs and a sleep filter:
    filter {
    sleep {
        time => "4"
    }
    }
  3. Monitor stats regularly with the Logstash API
  4. Send a SIGTERM to Logstash

Expected behaviour: values returned by the Logstash API should be relevant as pipelines are not completely drained yet.

RemiRigal commented 2 years ago

After some testing, the same behaviour is observed with Logstash 8.0.1 (docker image docker.elastic.co/logstash/logstash-oss:8.0.1 and bundled Java version).

RemiRigal commented 2 years ago

I noticed that in agent.rb, the call to the function stop_collecting_metrics is done before shutdown_pipelines regardless of the setting queue.drain, see the snippet below: https://github.com/elastic/logstash/blob/8edce82170951d421bf308228fc3baab0212f4e7/logstash-core/lib/logstash/agent.rb#L221-L231

Would it cause any harm to call the stop_collecting_metrics function after the pipelines have been shutdown ?

kaisecheng commented 2 years ago

Thanks for reporting the issue! Your observation on stop_collecting_metrics is correct. We need to test if stopping collection after shutdown pipelines would cause any problem.