Closed Juraci closed 5 years ago
Q: Where inside events-reader should the consumer be placed?
A: The consumer will need to be split out of the events-reader file. Maybe call it events-consumer or events-runner
Q: The topics cannot be created on the fly, so we will have to create a kafka topic beforehand, what should be the names of the topic to be created?
A: This doesn't make sense (topics can't be created on the fly) - how would kafka know the operating context of your service? It frankly *must* allow topic creation, and cannot know the context of creation. I think most likely i don't understand the real question - we should discuss this in person
Q: There should be more than one topic?
A): Absolutely. Each service:collection pair should have it's own topic. So fuse-telemetry:canalarms would be a topic, and iam:permissions may also be a topic. We should discuss topic naming (need to namespace so you can't get collisions across services on our platform).
https://github.com/kristofsajdak/hapi-rx-sse-examples/blob/master/kafka-sse-filter/kafka-sse-filter.js#L49 -> doesnt this solve your first blocker?
@ssebro no, the problem is not from Kafka to SSE, it is from the oplog to kafka. We would have to know what was the last thing produced to Kafka.
@Juraci there should only be one piece of code streaming from the oplog to kafka, and that piece of code should already have a debounced checkpoint write function - can't you just call it? If you need to store more details in the checkpoint write, that also seems very trivial.
@ssebro the whole point of using Kafka would be to decrease the amount of operations in the oplog. By using a checkpoint writer or anything else to track messages from the oplog to Kafka just defeats the purpose. Kafka would be just another queue that we have to sync, which seems to be a lot of work for absolutely no benefits.
I guess the main point it's around how we are dealing with our tools (Kafka and Mongo).
First, we are seeing events in our application and streaming them out through the internet to Kafka. Then, the same app which sent the event away, it's consuming the event back to trigger event handlers in the app. (I understand the concern of having the app going down and have to replay some events, but in my opinion, things a lot worst could happen in case of Kafka dies for example)
The second thing it more related to need of atomic operations between persist data and stream the event out. Considering this article, mongo before return an acknowledgment to the app it creates snapshot. So, even if mongo dies and we have sent the event away, when mongo restarts, we will have the data persisted. That's why, I don't see the reason of checking the oplog to stream events.
This Pull Request is intended to document the experiments we have made streaming events from the oplog to Kafka and from Kafka to SSE.
Hypothesis: We believe that by adding kafka to harvesterjs sse implementation Will result in a decreased amount of queries to the oplog We will have confidence to proceed when we see a decreased amount of connections to the oplog for a single resource
Results:
[Blocker] It would be necessary to have a mechanism in place to figure out what was the last thing transferred from the oplog to a particular Kafka topic for a specific client taking the last-event-id into consideration. This alone invalidates the Hypothesis.
[Blocker] There is no way to create topic programmatically on Kafka. Documentation no-kafka.
Harvesterjs based apps would have to have a Kafka infrastructure in place.
Kafka slows down the SSE process by several seconds.
This is not the recommended use of Kafka (the same app producing and consuming from a particular topic). Kafka is meant for distributed systems, the main benefits of having Kafka as a central queue would be lost with this approach.