Closed hc4 closed 8 years ago
@hc4 Which beat are you using and in which version?
I'm using filebeat and topbeat same version on every host. But got error only once :) So the problem sure not in beats
@hc4
I'm using filebeat and topbeat same version on every host.
And the version being…?
Additionally, please attach the configuration for each beat and the configuration of your Beats input.
1.2.3
@hc4 And the configuration for each beat and the configuration of your Beats input…
I'll check configs on monday. Looked in code and have a question. Shouldn't there be a break after line 82?
Filebeat output config:
logstash:
hosts: ["1.1.1.1:5044", "2.2.2.2:5044"]
max_retries: -1
tls:
certificate_authorities: ["pem file"]
insecure: false
Beats input:
{
"title": "Beats TLS",
"global": false,
"name": "Beats",
"content_pack": null,
"created_at": "2016-10-14T09:33:24.342Z",
"type": "org.graylog.plugins.beats.BeatsInput",
"creator_user_id": "user id",
"attributes": {
"recv_buffer_size": 212992,
"port": 5044,
"tls_key_file": "key8 file",
"tls_enable": true,
"tls_key_password": "key pass",
"tcp_keepalive": false,
"tls_client_auth_cert_file": "",
"tls_client_auth": "disabled",
"override_source": "",
"bind_address": "0.0.0.0",
"tls_cert_file": "crt file"
},
"static_fields": {},
"node": "node id",
"id": "input id"
}
@hc4
Filebeat output config:
That's not the complete configuration. Please provide the complete configuration of both of your beats…
which section exactly do you need? Configs may contain some sensitive information. e.g. I'm not sure if you need prospectors config
filebeat:
registry_file: registry
config_dir: config dir
output:
logstash:
hosts: ["1.1.1.1:5044", "2.2.2.2:5044"]
max_retries: -1
tls:
certificate_authorities: ["pem file"]
insecure: false
shipper:
logging:
to_files: true
files:
path: logs path
rotateeverybytes: 10485760 # = 10MB
keepfiles: 7
selectors: ["*"]
level: info
Problem repeats several times per day. I think it's caused by unstable network connection (server located in China with bad internet)
just looked deeper into code. It seems there is a bug in ReplayingDecoder usage. Methods processWindowSizeFrame, parseDataFrame, processCompressedFrame and parseJsonFrame checks availability of data in buffer and if there is not enough data resets read index to last checkpoint (by call to channelBuffer.resetReaderIndex()). Last checkpoint at moment, when this methods called is FRAME_TYPE. But after processing buffer with theese methods checkpoint always changed to PROTOCOL_VERSION. So next decode call assumes that checkpoint is PROTOCOL_VERSION, but actual read index points to FRAME_TYPE. Then first byte of frame type processed as version and error gets thrown.
Removed all buffer checks from code. Will test for some time. There is possible bug in my version - in case of compressed frame with incorrect data frame inside decoder will forever try to parse same broken message. But this shouldn't happen if protecol implemented correctly on both sides :)
Fixed decoder - BeatsFrameDecoder.java If problem will not occur in next few days, I can make PR.
Just got incorrect version message again. But now it was during cleanup after disconnect. In ReplayingDecoder during cleanup REPLAY excepation just ignored. So instead of breaking of reading data, BeatsFrameDecoder thniks that everything read and incorrectly sets state to PROTOCOL_VERSION. Added empty implementation of decodeLast(). Empty, because read frame must be ACKed, but after beat disconnected it is impossible. new version - BeatsFrameDecoder.java
Fixed deodeLast logic (method must read all data from buffer) No PROTOCOL_VERSION errors so far (3+ days). Only "Connection timed out" (as expected). BeatsFrameDecoder.java
Just upgraded to 1.1.2 and got error:
According to sources, there is missing format argument. Also strange thing is that I've got this error only once... Maybe some network error.