flipp-oss / deimos

Framework to work with Kafka, Avro and ActiveRecord
Other
59 stars 22 forks source link

Add Rake task to auto-send ActiveRecordProducer events #14

Closed dorner closed 4 years ago

dorner commented 4 years ago

Currently, our workflow is that we send events on callbacks. This works for most situations, but for codebases that need to make heavy use of import / insert_all / update_all, it becomes tricky and cumbersome to remember to send events every time.

For these use cases, a better pattern is to implement a separate task which polls the relevant tables and sends events for records whose updated_at column is recent. This ensures that no changes are missed, and also has the advantage of batching all updates at once.

Downside of this pattern is that there may be many events being sent unnecessarily (if columns the producer doesn't care about are changed) as well as more, possibly significant DB reads. It also introduces at least some delay between when the DB is updated and when Kafka is notified.

Another downside is that it can't handle deletes. There is unfortunately no easy workaround for this. App code can use a combination of KafkaSource (for deletes only) and this pattern.

This feature should add another table, e.g. kafka_source_updates to store the most recently seen updated_at column. The Rake task should use an Executor to continually poll the database and use ActiveRecordProducer to send relevant events. There should be some overlap (1 second? 5 seconds?) between the most recently seen updated_at and the first updated_at it searches for.