Graylog2 / graylog-plugin-beats

[DEPRECATED] Elastic Beats Input plugin for Graylog
https://www.graylog.org/
GNU General Public License v3.0
18 stars 18 forks source link

Missing protocol version in error about unknown protocol version #14

Closed hc4 closed 8 years ago

hc4 commented 8 years ago

Just upgraded to 1.1.2 and got error:

java.lang.Exception: Unknown beats protocol version: {}
        at org.graylog.plugins.beats.BeatsFrameDecoder.checkVersion(BeatsFrameDecoder.java:155) ~[?:?]

According to sources, there is missing format argument. Also strange thing is that I've got this error only once... Maybe some network error.

joschi commented 8 years ago

@hc4 Which beat are you using and in which version?

hc4 commented 8 years ago

I'm using filebeat and topbeat same version on every host. But got error only once :) So the problem sure not in beats

joschi commented 8 years ago

@hc4

I'm using filebeat and topbeat same version on every host.

And the version being…?

Additionally, please attach the configuration for each beat and the configuration of your Beats input.

hc4 commented 8 years ago

1.2.3

joschi commented 8 years ago

@hc4 And the configuration for each beat and the configuration of your Beats input…

hc4 commented 8 years ago

I'll check configs on monday. Looked in code and have a question. Shouldn't there be a break after line 82?

hc4 commented 8 years ago

Filebeat output config:

  logstash:
    hosts: ["1.1.1.1:5044", "2.2.2.2:5044"]
    max_retries: -1
    tls:
      certificate_authorities: ["pem file"]
      insecure: false

Beats input:

{
 "title": "Beats TLS",
 "global": false,
 "name": "Beats",
 "content_pack": null,
 "created_at": "2016-10-14T09:33:24.342Z",
 "type": "org.graylog.plugins.beats.BeatsInput",
 "creator_user_id": "user id",
 "attributes": {
   "recv_buffer_size": 212992,
   "port": 5044,
   "tls_key_file": "key8 file",
   "tls_enable": true,
   "tls_key_password": "key pass",
   "tcp_keepalive": false,
   "tls_client_auth_cert_file": "",
   "tls_client_auth": "disabled",
   "override_source": "",
   "bind_address": "0.0.0.0",
   "tls_cert_file": "crt file"
 },
 "static_fields": {},
 "node": "node id",
 "id": "input id"
}
joschi commented 8 years ago

@hc4

Filebeat output config:

That's not the complete configuration. Please provide the complete configuration of both of your beats…

hc4 commented 8 years ago

which section exactly do you need? Configs may contain some sensitive information. e.g. I'm not sure if you need prospectors config

hc4 commented 8 years ago
filebeat:
  registry_file: registry
  config_dir: config dir
output:
  logstash:
    hosts: ["1.1.1.1:5044", "2.2.2.2:5044"]
    max_retries: -1
    tls:
      certificate_authorities: ["pem file"]
      insecure: false
shipper:
logging:
  to_files: true
  files:
    path: logs path
    rotateeverybytes: 10485760 # = 10MB
    keepfiles: 7
  selectors: ["*"]
  level: info
hc4 commented 8 years ago

Problem repeats several times per day. I think it's caused by unstable network connection (server located in China with bad internet)

hc4 commented 8 years ago

just looked deeper into code. It seems there is a bug in ReplayingDecoder usage. Methods processWindowSizeFrame, parseDataFrame, processCompressedFrame and parseJsonFrame checks availability of data in buffer and if there is not enough data resets read index to last checkpoint (by call to channelBuffer.resetReaderIndex()). Last checkpoint at moment, when this methods called is FRAME_TYPE. But after processing buffer with theese methods checkpoint always changed to PROTOCOL_VERSION. So next decode call assumes that checkpoint is PROTOCOL_VERSION, but actual read index points to FRAME_TYPE. Then first byte of frame type processed as version and error gets thrown.

hc4 commented 8 years ago

Removed all buffer checks from code. Will test for some time. There is possible bug in my version - in case of compressed frame with incorrect data frame inside decoder will forever try to parse same broken message. But this shouldn't happen if protecol implemented correctly on both sides :)

Fixed decoder - BeatsFrameDecoder.java If problem will not occur in next few days, I can make PR.

hc4 commented 8 years ago

Just got incorrect version message again. But now it was during cleanup after disconnect. In ReplayingDecoder during cleanup REPLAY excepation just ignored. So instead of breaking of reading data, BeatsFrameDecoder thniks that everything read and incorrectly sets state to PROTOCOL_VERSION. Added empty implementation of decodeLast(). Empty, because read frame must be ACKed, but after beat disconnected it is impossible. new version - BeatsFrameDecoder.java

hc4 commented 8 years ago

Fixed deodeLast logic (method must read all data from buffer) No PROTOCOL_VERSION errors so far (3+ days). Only "Connection timed out" (as expected). BeatsFrameDecoder.java