Closed lkosz closed 8 years ago
What is your configuration? Also, if you have the full complete logs from start to finish that helps, it looks like you only uploaded snippets.
Regarding Redis, now that Log Courier connects to multiple indexers it generally does not need this form of buffer. I have a cluster at the moment where hundreds of machines send logs into a couple of balanced indexers. The protocol works really well in avoiding timeouts and maximising resource usage (I run Logstash at almost 100% for several parts of the day.) There may be other use cases for Redis though and I'm happy to hear - but currently I do not plan to add it as it moves courier away from it's current guarantees of at-least-once delivery since Redis is not really for non-volatile storage.
I've already attached config files and logs in first post :)
I see 3 major advantages of redis output in log-courier:
I agree that redis is volatile 'method of data storage' :) You can avoid data loss using clustered redis or partially by doing dumps. On the other hand logstash also has internal cache (and we can loose some data when process fail) and finally ElasticSearch doesn't assure that 100% of our data will be kept for ever.
The attached log files in pastebin are incomplete. The Log Courier one for example is only 35 lines so lots of information about when things were queued and which thing timed out and why is missing.
I only just noticed the config. I'm sorry!
Your network timeout is too low. It breaks the protocol (this is my fault - sorry - it could be better handled.) The 2 second expires before Logstash can finish processing the payload. The plugin at Logstash is getting rewritten as we speak but currently it only keep alive every 8 seconds so the timeout on Log Courier must really be at least 12 seconds - there is little reason to change it from the default of 15 though so maybe just remove that.
I've increased timeout to 30s and... it works :) but it seems to have some performance problems... logs from one heavily-writed file (50-55 lines/s) are shipped with lag. In lc-admin status I see that it is completed in 75%
Yes if Logstash can't keep up with the rate of logs then it will never get to the end of the file if they come in faster than they can be processed. Maybe you'll need more Logstash instances or some tuning on the Logstash side to increase throughput.
I'm also considering redis implementation. I was also preparing the code layout to maybe write a bridge-courier
of some sort which simply receives events and forwards them. It;s something that simplifies part of my setup anyway, and it would also mean a single point to update and reload with new logstash instances. Kind of like a mini-ELB on application level.
Both might be useful. Though I'm constrained on time with other personal things at the moment and need to finish latest version of log-courier/fact-courier first really!
I separated logstash and elasticsearch, and they are now on different nodes (3 ES nodes, 6 Logstash nodes) and it works ok. After heavy tests I've noticed that redis cause delays on some of the logs (5-15%). There was also problem with HA and clustering - logstash doesn't write to redis in round-robin way (I had 6 not clustered redis instances). It connects to random host so most of times I had problem that one or two redis processes were under heavy load, other 4 did nothing. Logstash input require host to be a string so I had to multiply inputs, each per one host (which is not a problem when config is handled by puppet) but problem begins when cache instances write to 1 or 2 redis instances. Logstash doesn't read quicker from two loaded redis instances when see that other 4 don't have events to process. So I decided to use RabbitMQ. After migration to rabbit cluster it works ok, very smoothly. Logs are present in Kibana much quicker, but there seems to be some performance problems in logstash output plugin. After simulated cluster failure logs are shipped by log-courier a bit slowly. Indexing and writing events to ES is not a problem, because I could run more indexer/writer instances and simply increase cluster performance. Problem is on input. Simple tool which gets events from log-courier and passes them to redis/rabbit would be nice. Logstash is too ponderous in such use case. Tomorrow I'll make stress tests (all apps under heavy load logging in debug) and I'll measure performance and stability of whole cluster
I've end up tests. Unfortunately using logstash with rabbitmq as output cause large impact on logs shipping performance. 5 caching instances which reads from log-courier and writes to rabbitmq can receive 4k events/s tops. With file as output 30k. Tool which can replace such logstash instance would be very useful :)
I will close this issue but I won't forget the discussions. I hope to get some time soon to get out a new revision of things!
I'm testing log-courier 2.0.4 (from official rpm package) and:
in log-courier with debug level http://pastebin.com/mG6KmfgM (and is waiting...)
logstash STDOUT (started with debug): http://pastebin.com/zTAFUuKP
and logs from /var/log/logstash/cache.log http://pastebin.com/MJJRz4HF
logstash config: http://pastebin.com/Jxj01y0q
log-courier config: http://pastebin.com/E7YL1znH and file definition (configuration is handled by puppet so it is similar for other files): http://pastebin.com/i2P1tKAn
Next I set in logstash to drop all events - the same: broken pipe, connection reset by peer, but I had to wait longer (2-5 min). After logstash restart log-courier wait some time before reconnect (as expected) and reported problem with connection. Next, after log-courier process kill and start again, the same.
Is there planned support for redis? It would be great if so, because log-courier is in my opinion best log shipper now (with comparison to filebeat and beaver) and lack of redis support is its biggest disadvantage.