agco / harvesterjs

Create JSONAPI-compliant APIs over a Node.js + MongoDB stack in an easy, boilerplate-free manner
http://agco.github.io/harvesterjs/
MIT License
68 stars 13 forks source link

[Do not merge][SSE] producing message from oplog to kafka and from kafka to SSE #224

Closed Juraci closed 5 years ago

Juraci commented 7 years ago

This Pull Request is intended to document the experiments we have made streaming events from the oplog to Kafka and from Kafka to SSE.

Hypothesis: We believe that by adding kafka to harvesterjs sse implementation Will result in a decreased amount of queries to the oplog We will have confidence to proceed when we see a decreased amount of connections to the oplog for a single resource

Results:

ssebro commented 7 years ago

Q: Where inside events-reader should the consumer be placed? A: The consumer will need to be split out of the events-reader file. Maybe call it events-consumer or events-runner

Q: The topics cannot be created on the fly, so we will have to create a kafka topic beforehand, what should be the names of the topic to be created? A: This doesn't make sense (topics can't be created on the fly) - how would kafka know the operating context of your service? It frankly *must* allow topic creation, and cannot know the context of creation. I think most likely i don't understand the real question - we should discuss this in person

Q: There should be more than one topic? A): Absolutely. Each service:collection pair should have it's own topic. So fuse-telemetry:canalarms would be a topic, and iam:permissions may also be a topic. We should discuss topic naming (need to namespace so you can't get collisions across services on our platform).

ssebro commented 7 years ago

https://github.com/kristofsajdak/hapi-rx-sse-examples/blob/master/kafka-sse-filter/kafka-sse-filter.js#L49 -> doesnt this solve your first blocker?

Juraci commented 7 years ago

@ssebro no, the problem is not from Kafka to SSE, it is from the oplog to kafka. We would have to know what was the last thing produced to Kafka.

ssebro commented 7 years ago

@Juraci there should only be one piece of code streaming from the oplog to kafka, and that piece of code should already have a debounced checkpoint write function - can't you just call it? If you need to store more details in the checkpoint write, that also seems very trivial.

Juraci commented 7 years ago

@ssebro the whole point of using Kafka would be to decrease the amount of operations in the oplog. By using a checkpoint writer or anything else to track messages from the oplog to Kafka just defeats the purpose. Kafka would be just another queue that we have to sync, which seems to be a lot of work for absolutely no benefits.

lairjr commented 7 years ago

I guess the main point it's around how we are dealing with our tools (Kafka and Mongo).

First, we are seeing events in our application and streaming them out through the internet to Kafka. Then, the same app which sent the event away, it's consuming the event back to trigger event handlers in the app. (I understand the concern of having the app going down and have to replay some events, but in my opinion, things a lot worst could happen in case of Kafka dies for example)

The second thing it more related to need of atomic operations between persist data and stream the event out. Considering this article, mongo before return an acknowledgment to the app it creates snapshot. So, even if mongo dies and we have sent the event away, when mongo restarts, we will have the data persisted. That's why, I don't see the reason of checking the oplog to stream events.