confluentinc / parallel-consumer

Parallel Apache Kafka client wrapper with per message ACK, client side queueing, a simpler consumer/producer API with key concurrency and extendable non-blocking IO processing.
https://confluent.io/confluent-accelerators/#parallel-consumer
Apache License 2.0
76 stars 124 forks source link

[OOM]Offset map reached to 75% pc-broker-poll thread would keep polling records lead to OOM #832

Open dumontxiong opened 2 weeks ago

dumontxiong commented 2 weeks ago

Hi team, from pc_metadata_space_used metric we can see after 17:38 framework stop encoding

Screenshot 2024-09-10 at 17 09 42

and we can see the log: io.confluent.parallelconsumer.state.PartitionState couldBeTakenAsWork 649 PartitionState.java - Not allowed more records for the partition (XXX) as set from previous encode run (blocked), that this record (571268) belongs to, due to offset encoding back pressure, is within the encoded payload already (offset lower than highest succeeded, not in flight (true), continuing on to next container in shardEntry.

But pc-broker-poll poll records all the time( from the below kafka_consumer_fetch_manager_bytes_consumed_rate metrics) :

Screenshot 2024-09-11 at 10 38 21

and from pc_waiting_records metrics we can see waiting records keep increasing.

Screenshot 2024-09-11 at 13 46 37

And service threw OOM at the end

And I have two concerns now:

  1. pc_waiting_records metric values are negative, Is that normal?
  2. when offset map reached to 75% and not allowed more records been processed, would pc-broker-poll thread keep polling records bring any memory issues?

BRs, Dumont

rkolesnev commented 2 weeks ago

Hi @dumontxiong

  1. pc_waiting_records being negative is not normal - there must be a bug in calculation of that metric.

  2. In fact - the metric is using internal calculation / counter that is used to determine how loaded the Parallel Consumer is - a bug in that calculation leading to low / negative number will cause Parallel Consumer back pressure / pausing of polling logic to not apply - leading to runaway memory usage that you are observing. That is looking like a bug somewhere in work tracking.

Just to be sure - what is your configuration (ParallelConsumerOptions)? specifically message buffer size related ones - concurrency, batch size, load factor (initial and max) and messageBufferSize?

Can you describe the scenario in which this was observed? High load? High number of retries? Straight after rebalance? - anything like that - the more details we can get - the better.

dumontxiong commented 1 week ago

@rkolesnev

back pressure logic is worked. It is indeed pausing the processing of records now, but the pausing of polling is not worked. It keeps polling records and causes OOM.

configuration: ordering=KEY maxConcurrency=14 commitmode=PERIODIC_CONSUMER_SYNC No other parameters are explicitly specified.

scenario: My test scenario is a scenario where 50% of records fail, and if it fails, it will keep retrying until it succeeds (in fact, it will never succeed, just keep retrying). At this time, the CPU is almost 100%. So it is high load,high number of retries.

BRs, Dumont

rkolesnev commented 6 days ago

@dumontxiong - the backpressure logic is exactly the logic that controls the polling for records - so there must be an issue in there - potentially caused by something being off with the calculation of retries (given that your scenario seems to be with high number or retries). It is separate from accepting work into worker threads based on metadata size usage. Can you provide a reproduction code for this? either as a small application or integration test?