elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.15k stars 4.91k forks source link

RabbitMQ output #581

Closed Semyazz closed 6 years ago

Semyazz commented 8 years ago

Any plans to implement RabbitMQ (RMQ) output? I got isolated environments and I want to send everything through RMQ. I use cert-based auth and so on and it'd be awesome to utilize the same BUS here.

monicasarbu commented 8 years ago

Not yet, but it would be great if you take the challenge and add support for it in Packetbeat. We are always encouraging our community to help us by adding support for the protocols they know the best.

monicasarbu commented 8 years ago

@Semyazz Ah, sorry, I just noticed that you are referring to the RabbitMQ output. The Beats are not supporting RabbitMQ as output, but you can send data to Logstash that supports RabbitMQ output plugin: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-rabbitmq.html. We don't plan adding more outputs in Beats, see for example the discussion form here: https://github.com/elastic/filebeat/issues/132

Semyazz commented 8 years ago

Yea I've seen that discussion and I kinda disagree. To me any queue, especially RabbitMQ which is, I guess the most standard solution and supports such things like (to me the most important) cert-based auth, gives you much more freedom to plan your architecture than logstash. Yea you can implement clustering, routing and so on in Logstash, but why if you already have a working solution.

Basically I need something light and fast, that will send all collected data to my data BUS (RMQ) and then I can transfer it wherever I want using any kind of fancy topology I come up with and at some point I do have Logstash already to process that data and push it back to RMQ to send it to another place. So as you can see in this case and after reading many posts about logstash/logstash-forwarder and ELK deployments people do, I believe many people have the very same or at least similar use-cases.

Generally I like Linux's tools philosophy where each tool can do its job reliable and doesn't try to be a huge multipurpose software. In this case RMQ to me is a great data BUS, Logstash is a perfect data processor and Beats just like logstash-forwarder, is the perfect data collector.

rlwmmw commented 8 years ago

+1.
There are obviously multiple ways to approach the problem of moving data around, but having the ability to introduce message queuing at every stage would greatly improve the accuracy of log collection, and alleviate back pressure on LS and ES!

geekpete commented 8 years ago

I'm in the same boat, as I currently use rabbitmq to queue inbound messages then have them consumed by a remote logstash. So the only currently supported method involving rabbit is to have beats send to a logstash that outputs to rabbit? I have to insert an additional logstash service in between my log shipper (beats) and my queue (rabbitmq)?

Back to python-beaver forwarder I guess. https://github.com/python-beaver/python-beaver

Add to that, the latest stable version of rabbitmq (3.6.0) has a new feature called lazy queues which will allow you to easily store/backlog hundreds of millions of messages using only a couple of hundred megabytes of ram, as long as the network and disk can handle the inbound. https://www.rabbitmq.com/lazy-queues.html

monicasarbu commented 8 years ago

@geekpete Yes, currently you need a logstash instance in between to transfer the data from the beats shipper to rabbitmq.

fatmcgav commented 8 years ago

It would be great if RabbitMQ would be considered for #943.

As above, I've got an existing RMQ BUS in place, and currently use Beaver to stick the logs into RMQ for Logstash to then consume...

geekpete commented 8 years ago

@fatmcgav What version of beaver are you using?

fatmcgav commented 8 years ago

@geekpete Apologies for the delay in responding... I'm currently running Beaver 34.1.0...

Cheers Gav

monicasarbu commented 8 years ago

RabbitMQ output was requested also in the old libbeat repository: https://github.com/elastic/libbeat/issues/313

timstoop commented 8 years ago

Reiterating my comment from the old ticket:

We use RabbitMQ as a buffer and as a way to easily distribute messages to the processing logstashes. Logstash itself tends to require a lot more resources to be able to handle the datastream than a RabbitMQ, even when it's only configured to push messages towards a queue. RabbitMQ is in our opinion far better at handling sudden increases in events than logstash with the added benefit of being a buffer in case the logstashes can't handle the traffic by themselves. Scaling in another logstash or two to add processing power would then quickly empty the queues.

Another reason for us is so we can better manage resources between customers. We maintain servers for several customers, each with their own queue in RabbitMQ. The logstashes are setup to treat each queue equally, so a sudden increase in traffic for customer X does not necessarily cause a slowndown of the processing for customer Y as well. In logstash, we would have to open additional ports for doing this (as far as I know, at least). That's why we would like to have AMQP support for libbeat.

Someone in this thread added certificate based authentication, we need that as well, but as I wasn't sure if logstash supported that (and there are ways around it if need be), I didn't originally mention it. We get a lot of data from other datacenter and we do not always have a VPN between them, having certificate based authentication and encryption is very nice. We use Beaver as the client currently as well.

geekpete commented 8 years ago

Unless there can be some low memory mode for logstash, then it won't compete with the lazy queue option in the latest rabbitmq 3.6.0. Lazy queues allow backlogs of hundreds of millions of messages in a single rabbit server (given there is enough disk space and fast enough disk to cater for it) but only uses a few hundred megabytes of ram. I'd call that super lightweight. (oops I mentioned this rabbit feature in an above comment, sorry.)

If logstash could be rigged in the same way, with some new special queue/buffer mode, then you wouldn't need the larger footprint just to buffer messages.

Either that or some kind of "proxy beat" written in Go that only acts as a message queue/proxy.

timstoop commented 8 years ago

And I personally wouldn't want logstash to do that. Focus on processing, leaving the queuing to the sw projects that focus on that, imho. That's one of the nice things about open source, chaining services that you have experience with to get the best end result.

geekpete commented 8 years ago

Do one thing well.

geekpete commented 8 years ago

So I actually took the time to go and read the original reasoning as mentioned here https://github.com/elastic/filebeat/issues/132 and I can see why it makes sense from a maintenance point of view. The work is done in logstash to support all the output plugins and anything missing between beats and logstash will be coming soon. But it adds another box into the stack. I'd probably consider removing rabbit if logstash performs ok and just going from logstash to logstash.

I wonder how much of a burst a logstash vs a rabbitmq could take on the same resources, I suppose it should be similar if written well.

ranleyos commented 8 years ago

I also have a corporate-wide Rabbit solution already in place. It is not only wise to continue to use that as my stream buffer, but it is a necessity. Our AMQP highway is already paved and heavily used. Filebeat (and/or the entire beats library) should be able to send directly to the AMQP stream and THEN Logstash can get involved. I'd really like to see this happen. I also think that this would GREATLY help the ELK stack in general by added flexibility.

ziporah commented 8 years ago

+1 We use rabbitmq as a redundant failover buffer inbetween systems. It is our main AMQP system, as redis was not yet easily configured to run redundant while the design was made. Our entire stack is now built upon rabbitmq and we are not planning to change the entire design only for managing to push the logs with beats. I also think it is stupid to first run beats and then logstash on the producer side, to make it then push to rabbitmq and the dynamic logstash pool in the backend. You can just as wel only run logstash and then push directly to rabbitmq, making beats absolete. No processing is done on the producer side anyway, simple input {file *} output{rabbitmq}

ranleyos commented 8 years ago

Quite right! Adding I have several teams, and me requiring that they install Logstash at the producer side is not an option, and doesn't make sense either. I cannot see how adding rabbitmq output would be that extra work. Perhaps if Elastic pushed the original source then let the community take care of it would be an option?

johntdyer commented 8 years ago

I could not agree more !

pietervogelaar commented 8 years ago

+1

An output for a message queue seems very logical. As RabbitMQ is a very popular message queue, I would really like an output for RabbitMQ!

ankopainting commented 8 years ago

+1 we use rabbitmq w/ beaver currently and it would be good to replace beaver with all the beats

lucasreed commented 8 years ago

+1 this would help greatly!

warbaugh commented 8 years ago

+1

pierrefevrier commented 8 years ago

+1

zepag commented 8 years ago

+1

bladedoyle commented 8 years ago

Need it. +1

thenom commented 8 years ago

+1

dkinon commented 8 years ago

+1

Beamboom commented 8 years ago

-1 It's a wise decision. It's enough with logstash supporting a plentitude of outputs if not the same work should be done here too. Use a logstash instance before RabbitMQ instead.

pietervogelaar commented 8 years ago

@beamboom How do you feel about filebeat that has redis support? Would you like to remove that also?

Beamboom commented 8 years ago

I'd not mind that, no.

2016-08-05 14:22 GMT+02:00 Pieter Vogelaar notifications@github.com:

@beamboom https://github.com/beamboom How do you feel about filebeat that has redis support? Would you like to remove that also?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/beats/issues/581#issuecomment-237836528, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ61D94EbEKsemVJCtUe-hZIO9M-lxYGks5qcysAgaJpZM4G5z8e .

dkinon commented 8 years ago

@Beamboom not sure how redis support fits your argument but not rabbitmq.

I use rabbitmq currently but would switch to redis in a second if filebeat supported it. So +1 redis support and +1 rabbitmq support, whichever gets us there quicker.

Logstash HA deployments in most, if not all, cases involve running logstash in 2 clusters/roles: logstash shipper (logstash-lumberjack collects logs from filebeat/logstash-forwarder and delivers them to a message queue) and a logstash parser (collects logs from message queue, parses appropriately and delivers to elasticsearch). With that architecture, IMHO having filebeat deliver to a message queue would be a welcome reduction in complexity and reduce the overall latency/resources of the entire logstash pipeline.

Beamboom commented 8 years ago

I'm sorry then I misunderstood the question. I don't mind no support for neither. I think the arguments from the Beat developers are valid and good.

Build your stack with one or more logstash after beat and before your RMQ/Redis, and do whatever you like after that. That focuses the work on output plugins on one layer - logstash - and enables Beat development to push forward.

I totally agree with that strategy.

Den 05. aug. 2016 20:56, skrev Daniel Kinon:

@Beamboom https://github.com/Beamboom not sure how redis support fits your argument but not rabbitmq.

I use rabbitmq currently but would switch to redis in a second if filebeat supported it. So +1 redis support and +1 rabbitmq support, whichever gets us there quicker.

Logstash HA deployments in most, if not all, cases involve running logstash in 2 clusters/roles: logstash shipper (logstash-lumberjack collects logs from filebeat/logstash-forwarder and delivers them to a message queue) and a logstash parser (collects logs from message queue, parses appropriately and delivers to elasticsearch). With that architecture, IMHO having filebeat deliver to a message queue would be a welcome reduction in complexity and reduce the overall latency/resources of the entire logstash pipeline.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/beats/issues/581#issuecomment-237934141, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ61D3c4gMRrJccUG4Iig2FfAhZpxQcxks5qc4dkgaJpZM4G5z8e.

ant31 commented 8 years ago

I think as users, we would like to replace the logstash shipper by a filebeat for multiple reason. But doing that we don't want to change the chain after this point.

from: logstash -> rmq -> logstash -> outputs to filebeat -> rmq -> logstash -> outputs

Rabbitmq have been the recommanded broker, many configuration include it.
It's fair to ask for its support in filebeat..

JorisAndrade commented 8 years ago

+1

sinchb commented 8 years ago

+1

trevorndodds commented 8 years ago

+1

I would like to see RabbitMQ added as an output for beats mainly because it's the only mature Windows option.

Redis does not officially support windows, so running Redis on any windows production environment is out.

Kafka has this bug (KAFKA-1194) which is a major issue on windows. Perhaps some other bugs too but this one is a big issue for me since Kafka is unable to perform cleanups of old logs.

froztbyte commented 8 years ago

I don't see anything mentioning a branch in this thread, so: has anyone taken a stab at implementing this?

If not, which code in beats should I look at to get an idea for starting on this?

andrewkroh commented 8 years ago

@froztbyte I'm not aware of anyone working on this. The relevant interfaces to look at are in https://github.com/elastic/beats/blob/master/libbeat/outputs/outputs.go and you can use the existing outputs as examples.

I recommend following the guidance in this comment with regard to how to do the development outside of the main project. This will enable you to develop and maintain the output without the overhead of maintaining a fork.

To be clear we are not interested in maintaining additional outputs at the current time. There is a lot of work involved for us to support additional outputs. We are small team and there are a bunch of other enhancement requests that we are focused on. We are happy to help by answering questions you have about the code or by reviewing code you develop.

froztbyte commented 8 years ago

@andrewkroh Thanks for the pointers, I'll dig into them.

At the risk of sounding nagging, is it possible that Elastic might reconsider the position held on other outputs? Is there a possible middleground of external contribution for the feature/support?

I understand the effort cost involved in developing and supporting additional outputs, but on balance it seems that there is both a large amount of community desire for this feature and the benefit of this feature adding support for a mode that would otherwise require a trampoline logstash instance (at this stage, at least).

selfieblue commented 7 years ago

+1

cdemi commented 7 years ago

+1

gplesz commented 7 years ago

👍

chhuang0123 commented 7 years ago

+1

warbaugh commented 7 years ago

We'll be releasing a RMQ plugin for libbeats in about a month. We need to clean it up, and do some more testing. But, it has been working reliably for a few months now.

cjlyons81 commented 7 years ago

warbaugh I am very interested in your RMQ plugin, would love to help test if you need?

viniiciusconceicao commented 7 years ago

@warbaugh I am also very interested in your RMQ plugin, let us know when you are ready to release it :)

ranleyos commented 7 years ago

Ditto, I'd be available to help develop or QA if your team would like any help.

On Apr 10, 2017 2:30 PM, "Vinicius" notifications@github.com wrote:

@warbaugh https://github.com/warbaugh I am also very interested in your RMQ plugin, let us know when you are ready to release it :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/elastic/beats/issues/581#issuecomment-293037583, or mute the thread https://github.com/notifications/unsubscribe-auth/ABnULv8Yh-KtLT39vFxB7fqiioFCLm2Bks5runT3gaJpZM4G5z8e .

warbaugh commented 7 years ago

We've written this for a specific use case, and therefore isn't a fully featured implementation. It has a fair amount of run time against it now, but lots of RMQ features are missing.

If people are ok with that, we can make the github repository public. I just don't want people's expectations to be too high.

timstoop commented 7 years ago

We're ok with that, as long as you have something that works, we'll create the PRs to expand the functionality ;-)