Today we are retaining all submitted jobs information in the datastore indefinitely, which is slowing down bacalhau list operations.
We need to provide a garbage collector that should delete all completed job information, including their executions, evaluations and history events after a configurable JobGCInternal window from their completion time.
The garbage collector should be agnostic of the backend implementation of the job store so we can reused it with different implementations, such as with NATS KV
We should also explore different GC configurations for different models. For example:
Evaluations are just triggers to evaluate jobs and can be deleted in just few hours after they have been processed even if the job is not completed yet
History events may grow our of hand, specially when we add job updates or have more use of long running jobs
Rejected or failed executions can be deleted even before the job is marked as completed if the job is a long running job
Proposal:
We can limit the focus of this issue to implement GC for jobs and evaluations, and then open a follow-up issue to think about the best course of action to manage the state of history events and executions. Maybe compaction of those events can be a better approach than just deleting them based on their age.
The configurations should look like:
# Job store configuration
StateStore:
JobGCInterval: 10m
JobGCThreshold: 30M
EvalGCThreshold: 1h
Backend:
Type: BoltDB
Config: {} # config related to the backend type
Today we are retaining all submitted jobs information in the datastore indefinitely, which is slowing down
bacalhau list
operations.We need to provide a garbage collector that should delete all completed job information, including their executions, evaluations and history events after a configurable
JobGCInternal
window from their completion time.The garbage collector should be agnostic of the backend implementation of the job store so we can reused it with different implementations, such as with NATS KV
We should also explore different GC configurations for different models. For example:
Proposal:
We can limit the focus of this issue to implement GC for jobs and evaluations, and then open a follow-up issue to think about the best course of action to manage the state of history events and executions. Maybe compaction of those events can be a better approach than just deleting them based on their age.
The configurations should look like: