LearnersGuild / idm

identity management service
MIT License
2 stars 24 forks source link

attempt `newGameUser` jobs 3x w/ 60s b/t each attempt #141

Closed jeffreywescott closed 7 years ago

jeffreywescott commented 7 years ago

Fixes #140.

Does not fix https://github.com/LearnersGuild/game/pull/558, which is a much bigger lift and will require significant refactoring.

Related: https://github.com/LearnersGuild/game/issues/510

Overview

We use bull for our job queues, which by default, do not automatically retry. What this means is that if something operational causes the job to fail (e.g., the database is in the middle of a restart, etc.), we just give up.

We should retry jobs to make the system more resilient. That's what this PR is about.

In addition, to help us with debugging, I've added matador to the /job-queues route of game so that we can see the status of the various job queues.

Data Model / DB Schema Changes

N/A

Environment / Configuration Changes

npm install is required to install matador

Notes

Different types of jobs may need different settings WRT retries. For example, some jobs may do only database-related stuff, while others might invoke remote APIs. When configuring the jobs to retry, I've used the following "rules" to determine how quickly the job should retry:

By default, all jobs will retry 3 times before giving up.

jeffreywescott commented 7 years ago

This one is safe. Since the id of the user being added must be unique, there's no chance for it to be added multiple times. And the API calls to GitHub and HubSpot should be idempotent.