contribsys / faktory

Language-agnostic persistent background job server
https://contribsys.com/faktory/
Other
5.78k stars 230 forks source link

How to iterate over jobs in dead queue? #442

Closed hunterp closed 1 year ago

hunterp commented 1 year ago

Usecase I have some jobs that failed and are in the dead queue now. I can write a small script that will "fix" state so I can retry the jobs, however I need some way to programmatically iterate over the dead jobs and read the job arguments. How do I do this?

Basically I'm looking for this functionality that exists in Sidekiq: https://github.com/sidekiq/sidekiq/wiki/API#dead . I know the Mutate API will let me retry the jobs, but I can't figure out how to actually get the job json payload.

mperham commented 1 year ago

You cannot. The Mutate API provides the only options available today.

hunterp commented 1 year ago

Thanks for the quick response @mperham . Do you have any recommendations for how to accomplish this today, or would this make a good feature request for future versions of Faktory Enterprise?

mperham commented 1 year ago

I think your best bet is to design your code to work within the current limitations. I wanted to limit the Mutate API to avoid race conditions with dead set processing and implement only features which scaled well with the size of the dead set.

You are welcome to put in a feature request, just please give specific details about your use case so I can better understand what you are trying to achieve and why it's not possible today.

hunterp commented 1 year ago

Hey Mike, what is the best way to get you a feature request? The use case here is the same as the use case for the sidekiq feature.

mperham commented 1 year ago

Open an issue. I want to hear WHY you need the feature. “Just like Sidekiq” is not sufficient, Faktory is a different beast.

hunterp commented 1 year ago

A different issue than this? The reason I opened this issue was to ask for this feature.

Why I need this feature: On regular occasion we will end up with jobs in the dead queue in Faktory that usually hit a bug around some state in the application, for example a file was corrupted during download, or a customer provided input data in an unexpected format.

Before I can retry those jobs, I need to go and delete that corrupted file from blob storage, or manually transform the input data. I do not want to build this fix logic directly into the application code, as it is typically very one-off and I would want to know if this type of error ever happens again. I don't want my application code to have lots of exception handling for exceptions that I honestly never expect to happen again.

Once we have investigated the dead jobs and determined the fix, I want to be able to script the fix and apply it to all the jobs that died in a similar fashion. However in order to script the fixing of these cases and retry the jobs, I need programmatic access to the dead job queue including the ability to

It would be sufficient if the faktory http server exposed a json API that provided the same data as the morgue page and an endpoint to "Retry Now" a job.

'Please let me know if you need more details or information about this use case, happy to share anything more that would be helpful.

mperham commented 1 year ago

It sounds like you're trying to bolt onto Faktory something that should be a workflow in your application. Faktory does not support payload mutation. You'd create a new job to reprocess the blob once your code is fixed.