cityindex-attic / logsearch

[unmaintained] A development environment for ELK
Apache License 2.0
24 stars 8 forks source link

Evaluate Apache Kafka #273

Closed sopel closed 9 years ago

sopel commented 10 years ago

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log - the implications of this design for our use case are so appealing that they are worth repeating here [emphasis mine] :

Fast - A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.

Scalable - Kafka is a designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers

Durable - Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.

Distributed by Design - Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantee.

Not surprisingly, and despite being a version 0.8 currently, Kafka is already used by quite some major companies (the powered by list is almost a who is who in cloud computing based success stories in general and logging/metrics/monitoring ones in particular) - some of these solutions we are either using ourselves, trying to approximate/extend within this project, or both.

It seems reasonable to evaluate whether it might be the proper solution for resilience and scale out for the messaging pipeline (see e.g. #269 and #272).

sopel commented 10 years ago

A few observations upfront:

So it is a somewhat more involved approach to the problems it addresses, but also a fairly thorough one. See Announcing Suro: Backbone of Netflix's Data Pipeline for a good example how resulting capabilities and architectures might look like.

sopel commented 10 years ago

@dpb587 writes in https://github.com/cityindex/logsearch/issues/269#issuecomment-30524547 :

I've never heard of Kafka, but skimming the docs it looks valuable. Unfortunately, it doesn't seem like logstash supports it as an input. Logstash does, however, support rabbitmq...

Yeah, an official logstash input is not to be seen yet, which is a bit surprising given this seems to be simple in principle, see some related attempts:

sopel commented 10 years ago

Moved to Icebox due to focus on #270 (as discussed in https://github.com/cityindex/logsearch-config/issues/59).

joekiller commented 10 years ago

Hi everyone,

I'm working to clean my distro up into a plugin that you can just add to a logstash distro and get the producer working as well. I'll try to remember to post here when it is finished. @squito has already provided a pull request with a producer that I'm evaluating.

sopel commented 10 years ago

@joekiller - thanks for the heads up, sounds promising - I'm looking forward to a readily available Kafka<->logstash integration :)

joekiller commented 10 years ago

I've updated the distro to make using it easier however the producer isn't "production worthy" yet. The consumer is though. Also, I have moved all work to this repo: https://github.com/joekiller/logstash-kafka

sopel commented 9 years ago

Closed as Incomplete due to not being actively evaluated right here, but being the mid-term goal now regardless - going forward, this will be discussed and hopefully implemented via https://github.com/logsearch/logsearch-boshrelease/issues/73.

joekiller commented 9 years ago

Just an FYI my logstash-kafka plugin is being actively merged into logstash core: https://github.com/elasticsearch/logstash/pull/1533

Producer and consumer are both working well.

mrdavidlaing commented 9 years ago

@joekiller; thanks for the heads-up - we'll definately be looking at using your logstash-kafka plugin when we do this work as part of https://github.com/logsearch/logsearch-boshrelease/issues/73.