elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.19k stars 3.5k forks source link

Loss of logs from logstash when rsyslog is restarted #14070

Open Tejaswi43 opened 2 years ago

Tejaswi43 commented 2 years ago

Hi Team,

We are using logstash 3pp v7.15.2 in our project.When a remote syslog server restarts logstash will take 1-60 minutes to reconnect. And during this time none of the logs will be "stored", so they will just be lost.After rsyslog and logstash connection is established everything works ok, logs at the time of reconnection are lost but these should be stored in the logstash persistent queue once the rsyslog output is not available. we tried following scenarios and these are our findings:

Case 1: Deployed elastic search,logstash (configured rsyslog output but not deployed)

stats of rsyslog pipeline :

    "outputs" : [ {
      "id" : "e6ad9c62172fd69b98c84ca8fada4ab96cb2e1b4bb5bc5248eead15a4b9f4012",
      "events" : {
        "out" : 0,
        "in" : 1,
        "duration_in_millis" : 86
      },
      "name" : "syslog"
    } ]
  },
  "reloads" : {
    "last_failure_timestamp" : null,
    "successes" : 0,
    "failures" : 0,
    "last_error" : null,
    "last_success_timestamp" : null
  },
  "queue" : {
    "type" : "persisted",
    "capacity" : {
      "max_unread_events" : 0,
      "max_queue_size_in_bytes" : 1073741824,
      "queue_size_in_bytes" : 7904,
      "page_capacity_in_bytes" : 67108864
    },
    "events" : 7,
    "data" : {
      "free_space_in_bytes" : 29855367168,
      "storage_type" : "ext4",
      "path" : "/opt/logstash/data/queue/syslog"
    },
    "events_count" : 7,
    "queue_size_in_bytes" : 7904,
    "max_queue_size_in_bytes" : 1073741824

we are able to see that rsyslog pipeline persistent queue is able to store events when rsyslog is not deployed. And once rsyslog is deployed it is able to receive all the logs.

Case2: Deployed elasticsearch,logstash,rsyslog restarting both elastic search and rsyslog and sending logs

pipeline stats for elastic search :

   "name" : "elasticsearch"
    }, {
      "id" : "07aa8e0b7b6a3b03343369c6241012b28a5381ebbbb638b8ec25904c8e2f947b",
      "events" : {
        "out" : 0,
        "in" : 0,
        "duration_in_millis" : 68
      },
      "name" : "stdout"
    } ]
  },
  "reloads" : {
    "last_failure_timestamp" : null,
    "successes" : 0,
    "failures" : 0,
    "last_error" : null,
    "last_success_timestamp" : null
  },
  "queue" : {
    "type" : "persisted",
    "capacity" : {
      "max_unread_events" : 0,
      "max_queue_size_in_bytes" : 1073741824,
      "queue_size_in_bytes" : 29636,
      "page_capacity_in_bytes" : 67108864
    },
    "events" : 4,
    "data" : {
      "free_space_in_bytes" : 29845852160,
      "storage_type" : "ext4",
      "path" : "/opt/logstash/data/queue/elasticsearch"
    },
    "events_count" : 4,
    "queue_size_in_bytes" : 29636,
    "max_queue_size_in_bytes" : 1073741824
  },

pipeline stats for rsyslog :

     "name" : "syslog"
    } ]
  },
  "reloads" : {
    "last_failure_timestamp" : null,
    "successes" : 0,
    "failures" : 0,
    "last_error" : null,
    "last_success_timestamp" : null
  },
  "queue" : {
    "type" : "persisted",
    "capacity" : {
      "max_unread_events" : 0,
      "max_queue_size_in_bytes" : 1073741824,
      "queue_size_in_bytes" : 29636,
      "page_capacity_in_bytes" : 67108864
    },
    "events" : 0,
    "data" : {
      "free_space_in_bytes" : 29845852160,
      "storage_type" : "ext4",
      "path" : "/opt/logstash/data/queue/syslog"
    },
    "events_count" : 0,
    "queue_size_in_bytes" : 29636,
    "max_queue_size_in_bytes" : 1073741824
  },

In this case events are not being stored in the syslog pipeline persistent queue but are stored in elastic search pipeline persistent queue.

And once elastic search and rsyslog are restarted we are able to receive logs(that we are sending during output disconnected) only in elastic search but not in rsyslog.

Could you please share your comments on this. we think that logstash is not detecting rsyslog when it gets restarted and hence logs are not stored in persistent queue and logs are lost.

kares commented 2 years ago

Thanks for the report, this sounds like a plugin specific issue. The syslog output uses UDP by default and seems to be opinionated about re-trying exceptions when a write on the UDP socket fails.

Thus if protocol => isn't set on the output (or is set to "udp") you should change it to "tcp" so that socket write exceptions are retried.

Tejaswi43 commented 2 years ago

We have configured protocol as tcp only. syslog { host => {{ .host | quote }} port => {{ .port }} protocol => tcp } we are getting exemptions as below [2022-04-28T01:13:34.034Z][WARN ][logstash.outputs.syslog ] syslog tcp output exception: closing, reconnecting and resending event {:host=>"rsyslog", :port=>514, :exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "rsyslog" port 514>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:134:in initialize'", "org/jruby/RubyIO.java:876:innew'", "/opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5.E001/lib/logstash/outputs/syslog.rb:219:in connect'", "/opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5.E001/lib/logstash/outputs/syslog.rb:187:inpublish'", "/opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-plain-3.1.0/lib/logstash/codecs/plain.rb:59:in encode'", "/opt/logstash/logstash-core/lib/logstash/codecs/delegator.rb:48:inblock in encode'", "org/logstash/instrument/metrics/AbstractSimpleMetricExt.java:65:in time'", "org/logstash/instrument/metrics/AbstractNamespacedMetricExt.java:64:intime'", "/opt/logstash/logstash-core/lib/logstash/codecs/delegator.rb:47:in encode'", "/opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5.E001/lib/logstash/outputs/syslog.rb:147:inreceive'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:105:in block in multi_receive'", "org/jruby/RubyArray.java:1820:ineach'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:105:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:143:inmulti_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "/opt/logstash/logstash-core/lib/logstash/java_pipeline.rb:295:inblock in start_workers'"], :event=>#}

kares commented 2 years ago

:exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "rsyslog" port 514>

those exceptions are going to be retried by the output until a connection is possible - the event in that case is not ack-ed for the PQ and is expected to stay in the queue.

In this case events are not being stored in the syslog pipeline persistent queue but are stored in elastic search pipeline persistent queue.

why where these separate pipelines mentioned together, you're using a single output and directing the same data to multiple pipelines (using a pipeline output) ?

Tejaswi43 commented 2 years ago

those exceptions are going to be retried by the output until a connection is possible - the event in that case is not ack-ed for the PQ and is expected to stay in the queue.

Yes as output is being retried the events are expected to store in queue, but they aren't being stored and due to this reason we are not able get these events after reconnection of rsyslog.

you're using a single output and directing the same data to multiple pipelines (using a pipeline output) ?

yes actually i was just comparing by configuring two output pipelines elastic search and syslog and found that logs at the time of restart are reaching only to elastic search but not for rsyslog after reconnection.

Tejaswi43 commented 2 years ago

@kares any other info needed? I can see label as needed-info.

Tejaswi43 commented 2 years ago

@kares Is there any progress regarding this issue?

Tejaswi43 commented 2 years ago

@kares any update regarding the issue?