bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
643 stars 85 forks source link

Job store garbage collection #4174

Open wdbaruni opened 2 days ago

wdbaruni commented 2 days ago

Today we are retaining all submitted jobs information in the datastore indefinitely, which is slowing down bacalhau list operations.

We need to provide a garbage collector that should delete all completed job information, including their executions, evaluations and history events after a configurable JobGCInternal window from their completion time.

The garbage collector should be agnostic of the backend implementation of the job store so we can reused it with different implementations, such as with NATS KV

We should also explore different GC configurations for different models. For example:

Proposal:

We can limit the focus of this issue to implement GC for jobs and evaluations, and then open a follow-up issue to think about the best course of action to manage the state of history events and executions. Maybe compaction of those events can be a better approach than just deleting them based on their age.

The configurations should look like:

  # Job store configuration
  StateStore:
    JobGCInterval: 10m
    JobGCThreshold: 30M
    EvalGCThreshold: 1h
    Backend:
      Type: BoltDB
      Config: {} # config related to the backend type