TheDataRideAlongs / ProjectDomino

Scaling COVID public behavior change and anti-misinformation
Apache License 2.0
62 stars 13 forks source link

message queue #3

Open lmeyerov opened 4 years ago

lmeyerov commented 4 years ago

Add a message queue with buffering and restart capabilities, such as managed kafka on azure

Right now, if we submit a large batch of IDs to twitter (ex: covid 50m), or say if our neo4j goes down for 12hr maintenance, we risk data loss etc. & manual retry efforts. A queue like kafka would make simpler.

007vasy commented 4 years ago

does it have to be on Azure?

lmeyerov commented 4 years ago

Our ideal is either:

--Simply self-hosted (cloud agnostic), e.g., a docker container of something simple --Something more complicated but still OSS (like Kafka) and that gets initially run through a managed service so less management pain for us

We currently have ~free compute on Azure, and will likely soon get some non-Azure support as well