bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
684 stars 88 forks source link

Distributed State Store #2330

Open aronchick opened 1 year ago

aronchick commented 1 year ago

Need a distributed state store - particularly for things like configs and environment variables.

https://youtu.be/yuxd2kurpzk?t=909

Scenario:

wdbaruni commented 6 months ago

I am not sure if we will be adding much value by providing our own state store to users. I can think of two options for jobs to checkpoint their progress:

  1. The job implementation itself would write some state when making progress, and then fetch this state during startup to know where to resume from. They can use any remote store they have access to persist this state, including S3. I don't think providing our own state store with different APIs would add value to them. We can provide a dedicate S3 bucket as part of our managed offering.
  2. Bacalhau would know about the progress on its own and checkpoint the state internally without the user having to do so. This requires providing our own framework for users to write their applications, or maybe a pipeline type of jobs where we can keep track of completed tasks and only retry or restart from failed ones