elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
106 stars 3.51k forks source link

Logstash crash receiving data from filebeats #12169

Closed M9k closed 3 years ago

M9k commented 4 years ago

Hi all, I have a problem with a instance of logstash as alredy described here: discuss.elastic.co

Every time it receive a document from filebeat it crash. The Filebeat send the data corectly only the first time, but logstash doesn't elaborate them and just throw the exception, without send them to ElasticSearch. Then filebeat stops sending the data reporting the error of connection reset by the peer in the trasmission. There are nothing between the 2 machine, they are on the same subnet, so no routing or firewall that can block a connection.

In any case winlogbeat (installed on multiple hosts and in the same machine with the filebeat too) keep working without any interruption or problem. I try to remove the SSL, change port, reboot the logstash machine, but the problem is still there. I try to use FileBeat from my pc with the same configuration sending some fake dhcp logs, same crash.

The configuration worked fine for ~10 days, then it stopped working without any edit or upgrade. The same configuration works fine on others machine without any problem.

Can it be a problem of configuration of the network interface in the hypervisor?

This is my filebeat configuration:

     filebeat.inputs:
    -
      paths:
        - c:\windows\system32\dhcp\DhcpSrvLog-*.log
      input_type: log
      include_lines: ["^[0-9]"]
      document_type: dhcp
      close_removed : false
      clean_removed : false                 
      ignore_older: 47h
      clean_inactive: 48h     
      fields:
        type: dhcp
      fields_under_root: true
    filebeat.config.modules:
      # Glob pattern for configuration loading
      path: ${path.config}/modules.d/*.yml

      # Set to true to enable config reloading
      reload.enabled: false

    #================================ Outputs =====================================
    #----------------------------- Logstash output --------------------------------
    output.logstash:
      hosts: ["192.168.10.119:5044"]
      ssl:
          certificate_authorities: ["c:/PATH/logstash-remote.crt"]

And this is the logstash configuration:

    input {
      beats {
        port => 5044
        ssl => true
        ssl_certificate => "/etc/logstash/ssl/logstash-remote.crt"
        ssl_key => "/etc/logstash/ssl/logstash-remote.key"
        type => "beats"
      }
    }

The logstash crash is:

    [DEBUG] 2020-08-10 16:37:16.399 [defaultEventExecutorGroup-4-2] BeatsHandler - [local: 192.168.10.119:5044, remote: 192.168.10.111:53998] Received a new payload
    [DEBUG] 2020-08-10 16:37:16.399 [defaultEventExecutorGroup-4-2] BeatsHandler - [local: 192.168.10.119:5044, remote: 192.168.10.111:53998] Sending a new message for the listener, sequence: 1
    [DEBUG] 2020-08-10 16:37:16.401 [defaultEventExecutorGroup-4-2] BeatsHandler - 6c5e698d: batches pending: false
    [DEBUG] 2020-08-10 16:37:16.507 [[main]>worker1] decorators - filters/LogStash::Filters::Mutate: adding value to field {"field"=>"hostname_lc", "value"=>["%{[agent][hostname]}"]}
    [DEBUG] 2020-08-10 16:37:16.507 [[main]>worker1] mutate - filters/LogStash::Filters::Mutate: removing field {:field=>"host"}
    [DEBUG] 2020-08-10 16:37:16.515 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x2a1345ec>}
    [DEBUG] 2020-08-10 16:37:16.516 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x2a1345ec>}
    [DEBUG] 2020-08-10 16:37:16.551 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x63f92aaa>}
    [DEBUG] 2020-08-10 16:37:16.552 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x63f92aaa>}
    [DEBUG] 2020-08-10 16:37:16.553 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x7901b925>}
    [DEBUG] 2020-08-10 16:37:16.553 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x7901b925>}
    [DEBUG] 2020-08-10 16:37:16.777 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x254e90cb>}
    [DEBUG] 2020-08-10 16:37:16.777 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x254e90cb>}
    [DEBUG] 2020-08-10 16:37:16.778 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x7b3fcbab>}
    [DEBUG] 2020-08-10 16:37:16.778 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x7b3fcbab>}
    [DEBUG] 2020-08-10 16:37:16.784 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0xdfd4890>}
    [DEBUG] 2020-08-10 16:37:16.784 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0xdfd4890>}
    [DEBUG] 2020-08-10 16:37:16.802 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x51fdffb1>}
    [DEBUG] 2020-08-10 16:37:16.803 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x51fdffb1>}
    [DEBUG] 2020-08-10 16:37:16.803 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x6c5363d6>}
    [DEBUG] 2020-08-10 16:37:16.804 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x6c5363d6>}
    [DEBUG] 2020-08-10 16:37:16.804 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x642c7e6e>}
    [DEBUG] 2020-08-10 16:37:16.805 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x642c7e6e>}
    [DEBUG] 2020-08-10 16:37:16.805 [Ruby-0-Thread-17: :1] grok - Running grok filter {:event=>#<LogStash::Event:0x1c1bbcfa>}
    [DEBUG] 2020-08-10 16:37:16.806 [Ruby-0-Thread-17: :1] grok - Event now:  {:event=>#<LogStash::Event:0x1c1bbcfa>}
    [DEBUG] 2020-08-10 16:37:16.855 [defaultEventExecutorGroup-4-1] BeatsHandler - [local: 192.168.10.119:5044, remote: 192.168.10.111:53999] Handling exception: Connessione interrotta dal corrispondente
    java.io.IOException: Connessione interrotta dal corrispondente
            at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_262]
            at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_262]
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_262]
            at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_262]
            at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) ~[?:1.8.0_262]
            at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.49.Final.jar:4.1.49.Final]
            at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
    [INFO ] 2020-08-10 16:37:16.862 [defaultEventExecutorGroup-4-1] BeatsHandler - [local: 192.168.10.119:5044, remote: 192.168.10.111:53999] Handling exception: Connessione interrotta dal corrispondente
    [WARN ] 2020-08-10 16:37:16.865 [nioEventLoopGroup-2-4] DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
    java.io.IOException: Connessione interrotta dal corrispondente
            at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_262]
            at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_262]
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_262]
            at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_262]
            at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) ~[?:1.8.0_262]
            at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.49.Final.jar:4.1.49.Final]
            at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.49.Final.jar:4.1.49.Final]
            at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
lanrekkeeg commented 3 years ago

@M9k any luck on this issue?

M9k commented 3 years ago

@M9k any luck on this issue?

No, I also have ~10 more machines with the same configuration, same OS, same sources, etc etc and they all works fine from many months, but this one never worked. I never understood why. I tried to redo the entire VM, change the network adapter type, use another filebeat, copy the configuration from other filebeat, etc etc, but nothing.

What are you trying to send with filebeat? I use it with DHCP files.

lanrekkeeg commented 3 years ago

i am sending general logs from one machine using filebeat agent to another server that has logstash configure. Somehow logstash is throwing connection reset by peer out of nowhere. I check tcpdump on logstash server. I can see data coming to the server but not accepted by logstash.

kares commented 3 years ago

connection resets are usually expected and might depend on potential TCP traffic filtering along the wire. here it seems beats and logstash are on the same network (local: 192.168.10.119:5044, remote: 192.168.10.111:53999).

I try to remove the SSL, change port, reboot the logstash machine, but the problem is still there. I try to use FileBeat from my pc with the same configuration sending some fake dhcp logs, same crash.

The configuration worked fine for ~10 days, then it stopped working without any edit or upgrade. The same configuration works fine on others machine without any problem.

these all indicate this is an external issue to LS (something to be fixed at the lower layer). maybe a system update run on one of the machines and something network related changed.

anyhow, this was a proper discuss question for the forum, not sure what else to do as this does not seem like a LS bug.