Open will118 opened 3 years ago
@edenhill
Thanks for a great report! Will investigate next week
I believe this check: https://github.com/edenhill/librdkafka/blob/master/src/rdkafka_txnmgr.c#L2367
needs to go before this check: https://github.com/edenhill/librdkafka/blob/master/src/rdkafka_txnmgr.c#L2346
Will add a test-case to verify this is indeed the problem, but it seems very likely.
Side note on your config:
EnableIdempotence = true, <-- enabled by default when transactional.id is set.
MaxInFlight = 1, <-- no need for this, it just slows down throughput. The idempotent producer has a window of 5 in-flight requests.
MessageSendMaxRetries = Int32.MaxValue, <-- this is already the default for the idempotent producer
Acks = Acks.All, <-- this is already the default
Yeah, some of them are indeed unnecessary - we are using MaxInFlight = 5
in our app.
Thanks for looking into the issue, what you've said makes sense based on my limited familiarity.
Looking forward to the next release!
Description
If connectivity to the cluster is lost after a call to
CommitTransaction
, subsequent transactions will "succeed" although there is no connectivity to the broker.I don't think this is the intended behaviour, my understand was we could ignore the delivery reports with a transactional producer (see https://github.com/edenhill/librdkafka/blob/5fa114ccab90b0a7640b2621bf3e88314d731b84/examples/transactions.c#L102-L109).
Based on the debug logs (see below), the
"No partitions registered: not sending EndTxn"
made me think of this change: https://github.com/edenhill/librdkafka/pull/3271 but I haven't investigated.Killing connectivity at other points behaves fine (e.g. before committing).
Apologies in advance if there is something I haven't understood.
How to reproduce
This repros consistently for me (100%)
The way I have been simulating connectivity issues is with
socat
:and then
docker-compose -f docker-compose.kafka.yml kill socat
(and127.0.0.1 kafka
in myhosts
file)This is the output from
kafkacat -f '%o %s'
, the gap you can see is where kafka was down.The librdkafka logs look like this when "successfully" committing:
Checklist
Please provide the following information:
1.6.2
,librdkafka 1.6.1
)2.7.0
)