Closed PMDubuc closed 8 years ago
I have seen this once on my own system, but was unable to work out what was happening internally. I've not been able to reproduce it locally unfortunately.
If you can enable log level: debug
in your configuration, that should shed some light if it happens again. If you're sending logs to syslog it sometimes filters debug level so you may need to set log file: /var/log/something
to drop logs somewhere. If you need to rotate it in case it gets big you can reload Log Courier and it'll reopen log files.
I'll keep looking into this.
Just want to note an observation in here so I don't forget. I saw 14 Pending Payloads total. But it appears on reconnect it would only resend 1 Payload. Checking Logstash I could see an acknowledgement was sent for this payload - though couldn't confirm it was received (though looked to be). The endpoint line count did not increase too. On each reconnect its the same single payload.
I found one possible cause for this which would be fairly random. The payload is not reset when it gets retransmitted. So if it was previously "half-acknowledged" before it timed out - it might enter a situation where the ACK from server is dropped and thus it keeps timing out.
Will roll up a 2.0.4 tonight to fix this.
I recently has a problem that caused logstash to start sending invalid field values to Elasticsearch which rejected the event. This caused the logstash pipeline to back up and all the log-courier processes started failing with messages on stdout like the following. The incident lasted about 10 hours. It was corrected and Logstash (2.3.2) and Elasticsearch (2.3.3) were restarted. The log-courier processes didn't recover until they were all restarted a few hours later. Is there a way to fix this?