Open mbrancato opened 5 years ago
Something I noticed is that every now and then I get a log from KSQL like the following:
[2018-10-24 01:06:06,367] WARN stream-thread [_confluent-ksql-default_query_InsertQuery_42-2fdba422-f172-45f0-966a-1fee31bfb44d-StreamThread-172] Detected task 0_0 that got migrated to another thread. This implies that this thread missed a rebalance and dropped out of the consumer group. Will try to rejoin the consumer group. Below is the detailed description of the task:
>TaskId: 0_0
>> ProcessorTopology:
> KSTREAM-SOURCE-0000000000:
> topics: [events]
> children: [KSTREAM-MAPVALUES-0000000001]
> KSTREAM-MAPVALUES-0000000001:
> children: [KSTREAM-TRANSFORMVALUES-0000000002]
> KSTREAM-TRANSFORMVALUES-0000000002:
> children: [KSTREAM-FILTER-0000000003]
> KSTREAM-FILTER-0000000003:
> children: [KSTREAM-MAPVALUES-0000000004]
> KSTREAM-MAPVALUES-0000000004:
> children: [KSTREAM-MAPVALUES-0000000005]
> KSTREAM-MAPVALUES-0000000005:
> children: [KSTREAM-SINK-0000000006]
> KSTREAM-SINK-0000000006:
> topic: StaticTopicNameExtractor(ALERTS)
>Partitions [events-0]
(org.apache.kafka.streams.processor.internals.StreamThread:773)
[2018-10-24 01:06:06,404] INFO stream-thread [_confluent-ksql-default_query_InsertQuery_173-c2066571-ae54-4f4e-8195-a6bf13b41f08-StreamThread-693] partition assignment took 1717868 ms.
current active tasks: [0_0, 0_4]
current standby tasks: []
previous active tasks: []
(org.apache.kafka.streams.processor.internals.StreamThread:280)
After some lengthy monitoring and log tailing, Here is what I think happens:
Even when reattaching a single KSQL instance, this occurs, but it does not churn as frequently. The biggest problem is I need multiple KSQL instances to keep up with my event stream. This is probably a good use case for allowing the forced removal of consumer groups with the new consumer. I honestly think that would be a workaround.
Hey @mbrancato, I have encountered the same issue. Did you succeed to find a workaround for it?
Those errors stopped appearing on my side when I've removed the following two env variables from my deployment:
- name: KSQL_PRODUCER_INTERCEPTOR_CLASSES
value: io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
- name: KSQL_CONSUMER_INTERCEPTOR_CLASSES
value: io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
@mbrancato were you able to find a fix for this issue? Can you share your KSQL configuration?
I was not. This was an operational issue over a year ago and I'm guessing we rebuilt everything to get up and running again. I can close this if needed.
I ran into an issue where KSQL (5.0.0) stopped or severely slowed in receiving messages from a topic. While I tried increasing the partitions and more KSQL instances, that didn't help. At some point, messages basically stopped (I'm looking into control center for better visibility).
What I did find was that when restarting KSQL, it did not seem to reattach to the consumer groups properly. A consumer groups would transition to RUNNING in the log output, but when monitoring that consumer group using
kafka-consumer-groups.sh
there wouldn't be any increase in the current offset for any partitions. I let this go for a long time and there is no movement in the current offset.I ran the
kafka-console-consumer.sh
against one of the existing topics and immediately received events, so I think Kafka is working fine.Is there any known workaround or solution for this?