confluentinc / parallel-consumer

Parallel Apache Kafka client wrapper with per message ACK, client side queueing, a simpler consumer/producer API with key concurrency and extendable non-blocking IO processing.
https://confluent.io/confluent-accelerators/#parallel-consumer
Apache License 2.0
66 stars 123 forks source link

PC keeps retrying in loop on InvalidPidMappingException: The producer attempted to use a producer id which is not currently assigned to its transactional id #830

Open mikhelef opened 1 week ago

mikhelef commented 1 week ago

Hi,

Seems like Parallel Consumer is not handling InvalidPidMappingException correctly in transactional mode with UNORDERED processing. When this exception occurs, Parallel Consumer enters a retry loop, repeatedly attempting to process the same record and encountering the same exception each time. This results in the consumer becoming stuck, unable to make progress. The TransactionalId is unique per producer, I use UUID when defining it. Also, I am using Poll&Produce method, I don't think I am able to handle the exception in my code.

Expected Behavior

When encountering an InvalidPidMappingException, maybe Parallel Consumer should close the producer that is causing this error and create a new producer instance.

Configuration:

Parallel consumer version : 0.5.2.8

//Parallel Consumer
    ParallelConsumerOptions<String, String> options = 
ParallelConsumerOptions.<String, String>builder()
                .maxConcurrency(16)
                .messageBufferSize(50)
                .batchSize(1)
                .ordering(ProcessingOrder.UNORDERED)
                .commitMode(CommitMode.PERIODIC_TRANSACTIONAL_PRODUCER)
                .build();

//Producer transactional id
transactional.id= TRANSACTIONAL_ID_PREFIX+ UUID.randomUUID();

Thank you for your help.

rkolesnev commented 1 week ago

can you supply a reproducer, please - as an integration test or a minimal app? Just from the description - i am not sure what causes the InvalidPidMappingException to be thrown in your setup - so its hard to evaluate what the best behaviour would be. AFAIK that should not ever happen with correct configuration and transaction handling - so it may be that there is a different bug causing message to be sent outside of started transaction... Just retrying send with new Producer Transaction ID prefix might not actually help (or at the very least whole transaction will need to be re-started and re-tried)

mikhelef commented 1 week ago

Hi,

Unfortunately, I am unable to reproduce the bug. It occurred after two days of inactivity from the producer, which is less than the 7-day transactional.id.expiration.ms setting.

The issue might be due to a broker failure, but I cannot be certain. In any case, I believe that parallel consumer should create a new transaction with a new instance of the producer and not retry the same transaction repeatedly, as this will always fail.