contribsys / faktory

Language-agnostic persistent background job server
https://contribsys.com/faktory/
Other
5.66k stars 226 forks source link

Batching and Uniqueness #450

Closed rocktavious closed 5 months ago

rocktavious commented 9 months ago

Hello All,

I have a question about a job design / orchestration problem we are facing. So the high level problem is that we run a job type that is triggered off receiving a git webhook. We then enqueue a batch of jobs that could take a while to finish (in some cases up to 10-20 mins for the entire batch to complete all the sub jobs). What can happen during that time is we receive a git webhook again - maybe even 2 or 3 or 4. Since the jobs in question use the "default branch head" as the clone target we don't need to process the batch for webhooks 2 and 3 but we would like 4 to stay in the queue.

This presents a problem when enqueuing batches since they don't seem to support uniqueness (at least for Faktory) and we'd really like to not have to build something custom that can effectively "debounce" this kind of incoming event and were hoping that there was a way to design our jobs and batches with Faktory primitives to achieve this.

I'm open to any and all suggestions for how to re-architect this so we can limit the amount of work we do. When this scales out to the number of customers X git webhooks we receive it would be WAY too much work and not scale well if we don't debounce / deduplicate these events while the original batch is still being worked.

Please HALP! 😉

mperham commented 9 months ago

You should be able to create a unique jobs within a batch. I'm not aware of any limits. Why do you say otherwise?

rocktavious commented 9 months ago

@mperham we need the "batch" to be the unique thing because we need success callback to run to only happen once the batch is complete and we don't want another batch (that represents all the work we need to do from the git webhook) to not run again until the first "batch" is complete. So that when we get webhooks "num 2" and "num 3" those batches are "skipped" from being enqueue but webhook "num 4" is enqueued still and we skip running the batch of jobs for events 2 and 3.

We feel like maybe we are missing something with the way to layout the jobs but we've tried a few concepts and nothing seems to get us the order of execution that we need (or the deduplication that we need).

mperham commented 9 months ago

That's a tough one. If you have a larger complex task which needs to be unique, I think you need to build that yourself.

Batches, because of their complexity, are not something which are guaranteed to succeed. I don't offer batch uniqueness because I can't guarantee that the lock will be removed upon success.