giraffi / fluent-plugin-amqp

Use AMQP broker to send or receive messages via FluentD
MIT License
15 stars 31 forks source link

Support multi workers #48

Closed okkez closed 5 years ago

okkez commented 7 years ago

Fluentd v0.14 supports multiple workers.

fluent.conf:

<system>
  workers 4
</system>
<source>
  # ... snip
</source>
<match *>
  # ... snip
</match>

This configuration will launch 4 fluentd processes to process messages. Multi workers will improve performance :rocket:

We need to implement following instance method to support multi workers:

module Fluent::Plugin
  class AMQPInput < Input
    # ...
    def multi_workers_ready?
      true
    end
  end
end

module Fluent::Plugin
  class AMQPOutput < Output
     # ...
     def multi_workers_ready?
       true
     end
  end
end

I think we can add multi_workers_ready? returns true to AMQPOutput. On the other hand, we cannot add multi_workers_ready? returns true to AMQPInput. Because AMQPInput uses Bunny::Channel#queue high level method and users can configure durable, exclusive, auto_delete via fluent.conf.

We should consider about multi woekrs(processes) safe when we want to support multi workers for AMQPInput. But I'm not familiar with RabbitMQ and Bunny :cry:

We need to update Fluentd dependency to >= 0.14.15 when support multi workers. Because Fluentd v0.14.14 or earlier have a bug that we cannot use plugins both multi workers ready and not ready(only single worker support).

warmfusion commented 7 years ago

Interesting, I'm not sure how much performance impact you'd see from this, it'd probably be better to actually use batches and ideally work out how to solve fluentd#1125.

However, for users wanting to use multi-worker configurations, we should look to support such an implementation, as I'm fairly confident it'd just work - the release notes for 14.12 provide the requirements which 0.10+ meets (0.14.x native plugin, no listening ports, etc).

I'm not sure why you don't think the in_ampq would work? Having multiple consumers on the same queue is by design - its how you can scale the consumption of events off busy queues by running multiple instances of fluentd on different nodes. The configuration options of those queues has to be the same otherwise the connection will abort as the target setup doesn't match expectations.

Doesn't look too hard to implement, but testing might be a bit trickier.

okkez commented 7 years ago

I think https://github.com/fluent/fluentd/issues/1125 is not scope in this issue.

However, for users wanting to use multi-worker configurations, we should look to support such an implementation, as I'm fairly confident it'd just work - the release notes for 14.12 provide the requirements which 0.10+ meets (0.14.x native plugin, no listening ports, etc).

We should support Fluentd v0.14.15 or later when we support multiple workers because Fluentd v0.14.12 (-0.14.14) has a bug that using plugins multi workers ready and not ready at the same time.

I'm not sure why you don't think the in_ampq would work?

AMQPInput uses Bunny::Channel#queue. Users can customize :durable, :auto-delete, :exclusive and arguments via fluent.conf AMQPInput cannot consume messages using multiple workers If user set :exclusive to true. Also I think AMQPInput may consume same messages in different workers.

See also: https://www.rabbitmq.com/consumer-prefetch.html