Open jpeel opened 7 years ago
Could you post a reproducer to help us investigate this quicker, and the specific problem you refer to, and not attempting to reproduce the same situation. Thanks in advance!
The current reactive kafka consumer does not work with the resuming supervision strategy. If the stage fails, for instance as the result of the wake up timeout limit reached, then the underlying kafka consumer is closed and the stage just sits there after the resume.
Hm, I've read the writeup again. The merging etc is not really core to the issue AFAICS, you simply mean that the Resume does not work on the Producer. Which may be true, we'll look into that - thanks for reporting.
In general one should not over-user the supervision strategies of Akka-Streams, they sometimes do counter-intuitive (even if "correct" in their own ways...) things. We have a plan to re-visit supervision in streams, but in general it is not as much of a core concept as it is with Actors - streams handle "fail forward / cancel backwards", and Resume specifically is a bit of a tricky one.
Yes, I have found that I need to be careful with using the resume supervision strategy. I wanted to point out that I am specifically talking about the consumer. The kafka consumer is shutdown when the kafkaconsumeractor is terminated due to the wakeup timeout. After the resume, the kafka consumer is still shutdown.
However, with the kafka producer it might be okay to resume depending on the exception that was thrown. The kafka producer isn't shutdown in this case. I haven't verified that this is true.
I don't have a canned example to reproduce the issue or the time to create an example at this time. To do so I would probably reduce the wake up timeout to a very low value and have a few consumers join the same consumer group for a topic with a substantial number of partitions. Our production case where we saw this issue uses 100 partitions 20+ brokers.
I have this issue with the consumer. To reproduce it just configure your Source with Deserialization as StringDeserializer
and then send to the topic an Integer. Then after the Serialization Exception
the kafka-consumer is close, even having Supervisor.resume . The version isakka-stream-kafka 0.20
4/5 years later and we are also facing the same issue using akka-stream-kafka 2.1.1.
Any pointers on how to solve skipping messages without using the "broken" Resume strategy?
The current reactive kafka consumer does not work with the resuming supervision strategy. If the stage fails, for instance as the result of the wake up timeout limit reached, then the underlying kafka consumer is closed and the stage just sits there after the resume.
If another kafka consumer source is merged downstream with the resumed stage, this second kafka consumer source will not send messages to this merged stream. I believe that the stream is still marked as closed, but I'm not fully clear on this part. My code has an alsoTo() after the kafka consumer stage which handles the commits. This commit stream is still being fed so the commits are happening but no messages are being sent to the merged stream.
It would be nice for the resume supervision strategy to work for the kafka source, but at the least, there should be a warning in the documentation that it shouldn't be used for the kafka consumer stage.