Research: how should queue worker gracefully (re)start?

Sotera / watchman

Watchman: An open-source social-media event-detection system

GNU General Public License v2.0

20 stars 7 forks source link

Research: how should queue worker gracefully (re)start? #87

Closed lukewendling closed 7 years ago

lukewendling commented 7 years ago

Problem: after deployment, or service (re)start, services immediately begin to pull existing items from redis lists.

brainstorm: what to do on container start/restart, on deploy, and on slc service start/restart?

what happens to python services that are working on related jobs when keys are deleted by the queue worker?

cc @justinlueders

lukewendling commented 7 years ago

Proposed Solution

Phase 1 sledgehammer: mimic reset-all.sh -> reset smposts, rm all other tables, kill all redis keys. Why? while still in active dev throughout the project, its less confusing to start with a clean canvas.

Phase 2 paintbrush: find incomplete jobset and jobmonitors, and restart those. remove related postsclusters. Challenge: how to rm postsclusters ids from aggclusters? anywhere else?

lukewendling commented 7 years ago

closing. after migrating from pubsub to list-based redis processing, and moving job-scheduler to own microservice, deployments/restarts are considerably easier to manage.