Closed calohmn closed 3 years ago
The behaviour of the KafkaProducer when trying to publish on a non-existing topic (with topic auto-creation disabled in the server) is:
send()
for the max.block.ms
period (default 1 minute), resulting in a
org.apache.kafka.common.errors.TimeoutException: Topic [topic] not present in metadata after [max.block.ms value] ms.
exception.metadata.max.age.ms
(default 5 minutes) after the producer last got a metadata update for the topic:
block on send()
for the delivery.timeout.ms
period (default 2 minutes), resulting in a
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for [topic]:[x >= delivery.timeout.ms] ms has passed since batch creation
exception.After that, there will be repeated further attempts (every 100ms it seems) to update the metadata for the topic on the kafka-producer-network-thread
for a period of metadata.max.idle.ms
(default 5 minutes).
Workarounds/ways to prevent this:
hono.command_internal.[adapterInstance]
topic on adapter shutdown
-> this would leave behind many unused topics over time, The proper way to solve this is to check the status of the corresponding adapter instance before publishing on the hono.command_internal.[adapterInstance]
topic, using the AdapterInstancesLivenessService, as planned in #2028.
An AdapterInstancesLivenessService has now been implemented and is being used before publishing on the internal command topic (#2028).
Edge cases where the liveness service hasn't yet noticed that the adapter is dead and commands still get forwarded to the adapter might still occur. But in any case this would be an exceptional scenario. Normally, on adapter shutdown, the command-to-adapterInstance mappings first get removed by the adapter (via unregisterCommandConsumer
invocations, see also #2760 here) and only then the internal command topic gets deleted. To prevent such scenarios, probably caused by errors invoking unregisterCommandConsumer
, I think it would make sense to first look at #2760.
After a protocol adapter pod has been stopped, the Command Router might still try to forward command messages on the internal command address (if the command consumers using that adapter were not properly unregistered).
This should not result in any problems - the assignment of any devices to the obsolete adapter instance id will be overwritten by devices connecting to different adapter instances over time anyway.
But a log at the logs of the Command Router reveals that the Kafka producer on the internal topic is trying to fetch metadata for the deleted topic for a long time: