chaps-io / gush

Fast and distributed workflow runner using ActiveJob and Redis
MIT License
1.05k stars 106 forks source link

Add Gush::Job#skip! (refresh) #109

Open natemontgomery opened 8 months ago

natemontgomery commented 8 months ago

I went ahead and refreshed @ace-subido's original PR (https://github.com/chaps-io/gush/pull/66) with his permission. He did the heavy lifting, so props to him, I just added the integration spec requested by @pokonski and merged in the latest master.

From the original discussion in https://github.com/chaps-io/gush/issues/65 I gather there was some divergence of opinions on just what the semantics should be for the skipped state. Personally I think there is a clear space for a state that involves halting execution of a Job within a Workflow but not the Workflow itself (ie not failed).

I also can see the reason for thinking that there is a generalization of this semantic to more of the Workflow lifecycle (ie skip_remaining! as some suggested). I have noodled a bit on this idea and have some initial attempts that I think might get somewhere but I think it might be worth getting this more Job specific skip method in first.

Welcome all feedback of course!

natemontgomery commented 8 months ago

@krzyzak if you want to have a fresh discussion since original was so old I can go ahead and update the Issue or start a new one. For my use case this does help simplify the conditions in my workflows so its a win for me but I know that is just the one case :)

natemontgomery commented 4 months ago

@pokonski I brought this back up to date with master. Would you like to discuss this some more before moving forward?

pokonski commented 3 months ago

Hi @natemontgomery sorry for the late reply, somehow it slipped through the cracks of my inbox! Will get back to you on this soon, catching up :)

pokonski commented 3 months ago

The first concern I have is about downstream jobs that might depend on the payload returned from a skipped job. Not sure what the right approach here might be, pass some info into the payload that this job skipped so dependent jobs can act accordingly?

natemontgomery commented 3 months ago

@pokonski That is a fair concern. I have not used skip in a job that needs a payload to be passed down, as my later jobs only ever depend on the side effect on another record in my case.

A workflow designed around a payload that goes to a dependent job seems like it would definitely get caught in an exception case if an implementer were to skip the running job.

I think your initial idea of 'pass the state forward' seems intuitive and reasonable. If for example a DownloadVideo job were to be skipped an EncodeVideo job that looked for a payload output would not have a way to know what file path would be expected, at least not generally. So, if we put an indicator that our DownloadVideo parent job was 'skipped' then we could add logic to the EncodeVideo job that avoided an error.

This does bring up a bit of a conundrum though, since any output into payload for a child job would be a possible dependency then the dependency chain would possibly just have to stop at the skipped job. This outcome would be analogous to a 'skip_remaining' type call that some suggested in the initial issue that motivated the original PR.

This would halt the rest of the workflow, which is not my goal for the 'skipped' job state itself, but it would solve the problem of payload discrepancies. The idea could be documented as 'you can skip jobs as long as you do not use a payload, otherwise use a skip_remaining call'.

I think this is a decent short-term approach, but it might be good to find a way to refactor the 'payloads' to allow them to react to state changes and provide more flexible interfaces. That does feel like a larger lift though.

For now, I think since the original discussion already brought up the 'skip_remaining' (or 'skip_workflow') idea and it seemed to be clicking for people. I think this suggests that as part of this PR we do need to implement that idea to better account for more edges implementers could face. I also think this addition will probably still need some more discussion before we move forward but I am glad to get some feedback :)

TL;DR I will try out adding state information into the 'payloads'. Also, I plan to add an implementation of a skip_remaining_jobs method that will let a user who has jobs that depend on the current one halt the workflow. Any further feedback before then is very welcome though!

pokonski commented 3 months ago

Thanks for the detailed response! I agree that initially we can go with just adding some data to the payload, so child jobs can act on it. Since it's a new feature, we can collect feedback and not break anything for existing users :)

pokonski commented 1 month ago

Looks good on my end! Can you resolve conflicts? I'll get that merged

natemontgomery commented 1 month ago

You got it! I have been looking to get back to this, will try and get a further update for this behavior as well.

natemontgomery commented 1 month ago

Ok conflicts resolved. I will try and get an update to push the skipped state into the output of the jobs soon. I had some sketched-out stuff but got busy and didn't get tests done.