Currently Kala has a mechanism that periodically persists the entire job cache from memory (or wherever) into the DB. This is not fault-tolerant because the service could be brought down or crash in the intervening time between job creation and persistence, and the job would be lost. The same goes for a delete or update operation.
Furthermore, there are certain operations whereby the relationship between jobs (parents, children) are updated asynchronously.
Therefore:
[ ] Change the persistence mechanism so that jobs are persisted singly to the DB upon any write operation, and return a failure if the DB operation fails.
[ ] Review code that handles the relationships between jobs and determine if this can or should be made transactional.
Note that we cannot move to a DB-only architecture because the scheduling is based off of in-memory timers, so we need that in-memory cache layer.
Currently Kala has a mechanism that periodically persists the entire job cache from memory (or wherever) into the DB. This is not fault-tolerant because the service could be brought down or crash in the intervening time between job creation and persistence, and the job would be lost. The same goes for a delete or update operation.
Furthermore, there are certain operations whereby the relationship between jobs (parents, children) are updated asynchronously.
Therefore:
Note that we cannot move to a DB-only architecture because the scheduling is based off of in-memory timers, so we need that in-memory cache layer.