ajvb / kala

Modern Job Scheduler
MIT License
2.13k stars 188 forks source link

Prevent deleted job to run (with test) #257

Closed josejuanmontiel closed 2 years ago

josejuanmontiel commented 2 years ago

Following with comments in https://github.com/ajvb/kala/pull/245 this PR has cherry-pick the code and adding test (and now with https://github.com/ajvb/kala/pull/249 merged) should pass (the lint)

josejuanmontiel commented 2 years ago

Need extra work to be notice:

tooolbox commented 2 years ago

Thanks @josejuanmontiel I will take a look.

josejuanmontiel commented 2 years ago

Hi @tooolbox the test (with race detection) sometimes detect another (TestRemoteJobBadStatus need a lock on final test too) other doesn't... check this https://app.circleci.com/pipelines/github/josejuanmontiel/kala?branch=feature%2Fprevent%2Btest same commit c860372 , first fail... sencond (31) works...

About "delete" i think we can talk about now...

but i'm going to start to play (https://github.com/equilibristofgo/sandbox/tree/feat/mutex_example/05_race_condition) with an aprox that do the things without need so much locks... i'll try for fun... PR coming soon :)

tooolbox commented 2 years ago

Hi @tooolbox the test (with race detection) sometimes detect another (TestRemoteJobBadStatus need a lock on final test too) other doesn't... check this https://app.circleci.com/pipelines/github/josejuanmontiel/kala?branch=feature%2Fprevent%2Btest same commit https://github.com/ajvb/kala/commit/c8603729e03d14c72b334d2d0756fe3848453c64 , first fail... sencond (31) works...

Okay. What I understand is that TestRemoteJobBadStatus is racy because the same commit failed & succeeded on different runs, with no changes.

Seems like the easy fix is to make https://github.com/josejuanmontiel/kala/blob/c8603729e03d14c72b334d2d0756fe3848453c64/job/job_test.go#L1103 needs look like https://github.com/josejuanmontiel/kala/blob/c8603729e03d14c72b334d2d0756fe3848453c64/job/job_test.go#L1083 (correct me if I'm wrong).

About "delete" i think we can talk about now...

What is needed on "delete"?

but i'm going to start to play (https://github.com/equilibristofgo/sandbox/tree/feat/mutex_example/05_race_condition) with an aprox that do the things without need so much locks... i'll try for fun... PR coming soon :)

I see. In the meantime we should proceed with this PR, correct?

At this point I am going to re-review the code of this PR.

josejuanmontiel commented 2 years ago

Everything looks good to me, barring the global mutex. I think you were going to change it to a per-job mutex, correct?

I think it's not going to be trivial... if you want to try yourself ... i don't want to commit again until solve the problem

(However, if the problem is because of improper/lacking mutex usage in a test, it may be a better idea to fix the test?)

To solve the random ok in test, we need fix the test with lock arround the link where the test check the value... but it's not related with remove global lock.

Otherwise I think we are very close to merging.

I think i'm going to need more time to change the global for other kind/place of lock.

josejuanmontiel commented 2 years ago

@tooolbox what do you think... i can't do that this three test TestRemoteJobRunner TestRemoteJobBadStatus TestRemoteJobBadStatusSuccess works with the fix of the delete... and avoid race condition (that could be a false positive if you think... the race span a lot of gorutines with the code of the test that couldn't happen in the reality because the start of cache to solve the real delete)

Or global mutex... or place in another file and skip //go:build !race // +build !race ... and later try to refactor in another way.

Later... this night i upload one commit with this aproach.

josejuanmontiel commented 2 years ago

done... let's talk :) (meanwhile i'll think a way to remove all locks but... for other PR)

tooolbox commented 2 years ago

Fascinating. So the Remote Job tests were racy, but not because of something within Kala itself.

I mean, it now looks somewhat obvious, but you handled it cleverly.

tooolbox commented 2 years ago

Merged. Thanks for the excellent work and all of your valuable time :)