david-streamlio / pulsar-nifi-bundle

NiFi Processor for Apache Pulsar
Apache License 2.0
16 stars 19 forks source link

Pulsar consumer/producers leak with percentage of unacked messages when using async mode #25

Closed nicknezis closed 2 years ago

nicknezis commented 2 years ago

When running NIFI alongside a Kubernetes deployed Pulsar cluster, we find that Pulsar metrics in cluster show producers and consumers are stable for a while, but eventually start growing. After some time, the growth reaches a limit and messages stop flowing.

We are making connections from NIFI to the Pulsar brokers through the Pulsar Proxy. Also we have the NIFI PulsarConsumer and PulsarProducer configured to use Async communication.

When testing with a simple Java client, we find that we can read data out of Pulsar, and publish to NIFI (using HTTP Listener processor). Because of this, we feel we have isolated the issue to the NIFI processors and/or controller services.

We are also running a few different Pulsar client controller services. Not sure if that would impact the performance. Essentially we have one for publishing and another for the consumption of data.

Even before we see Pulsar producers and consumer counts climbing, we do see NIFI log messages that consistently imply a smaller ack count (compared to the received message count). While the consumer processor is running, we will see old data retransmitted. We fear this failure to ack all received messages may be contributing to the eventual failure of the flow of data.

david-streamlio commented 2 years ago

Resolved