ajvb / kala

Modern Job Scheduler
MIT License
2.11k stars 187 forks source link

Prevent deleted job to run #245

Closed n10ty closed 1 year ago

n10ty commented 3 years ago

This PR solving #237

As you can see in the original code: first job run, then checked Is It exists in the cache. We don't do anything if the job deleted: neither try to run nor reschedule it.

tooolbox commented 3 years ago

Sorry, although I have done some maintenance work on this repo in the past I have not had much time recently and @ajvb has been absent.

Two points:

  1. The lint check is failing. (If this is because of #249 excuse me and I'll take the time to pull that one in.)
  2. I would like to see two commits: one with a failing test that reproduces this scenario, and a commit with your changes that fixes the failing test.

That's maybe a lot to ask but it would make me more comfortable re merging this. Thanks in advance.

josejuanmontiel commented 3 years ago

This isn't a test or two commits :) only a logs that i stored from yesterday

./kala serve --jobdb=postgres --jobdb-address=localhost:5432/db?sslmode=disable --jobdb-username=postgres --jobdb-password=12345678 --persist-every=360 INFO[0247] Job backend::xxx:e9272252-1221-4e8c-5e65-be9c0420ce8f repeating in 9.587743219s INFO[0257] Job backend::xxx:e9272252-1221-4e8c-5e65-be9c0420ce8f repeating in 2m59.984205812s INFO[0437] Job backend::xxx:e9272252-1221-4e8c-5e65-be9c0420ce8f repeating in 2m59.980811475s INFO[0481] Deleting e9272252-1221-4e8c-5e65-be9c0420ce8f INFO[0481] Job backend::xxx:50368b08-99cb-4eca-4c4a-eb63aa2ae235 repeating in 9.496309062s INFO[0491] Job backend::xxx:50368b08-99cb-4eca-4c4a-eb63aa2ae235 repeating in 2m59.982445547s WARN[0617] Job backend::xxx with id e9272252-1221-4e8c-5e65-be9c0420ce8f ran, but seems to have been deleted from cache; results won't be persisted. INFO[0617] Job backend::xxx:e9272252-1221-4e8c-5e65-be9c0420ce8f repeating in 2m59.979871709s INFO[0671] Job backend::xxx:50368b08-99cb-4eca-4c4a-eb63aa2ae235 repeating in 2m59.971442338s WARN[0797] Job backend::xxx with id e9272252-1221-4e8c-5e65-be9c0420ce8f ran, but seems to have been deleted from cache; results won't be persisted.

git cherry-pick e6c8d30 [develop 573129d] Prevent deleted job to run Author: Andrii Mazurian a.mazuryan@gmail.com Date: Fri Jan 22 22:41:34 2021 +0200 3 files changed, 33 insertions(+), 10 deletions(-)

./kala serve --jobdb=postgres --jobdb-address=localhost:5432/db?sslmode=disable --jobdb-username=postgres --jobdb-password=12345678 --persist-every=360 INFO[0016] Job backend::xxx:6dcc0d4b-a9b8-4337-63b1-acc0a9e9162a repeating in 9.652610885s INFO[0025] Job backend::xxx:6dcc0d4b-a9b8-4337-63b1-acc0a9e9162a repeating in 2m59.969222396s INFO[0205] Job backend::xxx:6dcc0d4b-a9b8-4337-63b1-acc0a9e9162a repeating in 2m59.956198826s INFO[0253] Deleting 6dcc0d4b-a9b8-4337-63b1-acc0a9e9162a INFO[0253] Job backend::xxx:eaa973b1-121d-4e34-6c96-c99bd031f8e2 repeating in 9.80399487s INFO[0263] Job backend::xxx:eaa973b1-121d-4e34-6c96-c99bd031f8e2 repeating in 2m59.980989207s INFO[0385] Job backend::xxx with id 6dcc0d4b-a9b8-4337-63b1-acc0a9e9162a tried to run, but exited early because its deleted INFO[0443] Job backend::xxx:eaa973b1-121d-4e34-6c96-c99bd031f8e2 repeating in 2m59.982147681s

OUCHUNYU commented 2 years ago

@tooolbox @n10ty When are we gonna get this change in? I'm currently sort of blocked by this.

OUCHUNYU commented 2 years ago

Actually I found an issue @n10ty if you have a job running, say once every 5 min, before the next job runs, stop the server, and wait for > 5 min, restart the server, you will see the job runs before the job is added to the cache, therefore you will not have the job run at all. I moved err := c.Set(j) before

if j.ShouldStartWaiting() {
        j.StartWaiting(c, false)
}

Then everything works as expected.

tooolbox commented 2 years ago

I pulled in the lint-fixing PR from @josejuanmontiel

Since the logic for dealing with jobs has become quite complex, the only sane path forward is to have unit tests for every issue, showing how it's broken--and then fixed by a patch.

n10ty commented 2 years ago

Sorry, guys, I don't have enough time to work on it. I'm using forked version in production. Even now I don't really like how it works: after deletion job still remains in cache and just skips during next run. Feel free to make changes and add test to cover all cases. Also, I'm thinking about moving to GCP scheduler in future.

josejuanmontiel commented 2 years ago

Hi, i think the test in #255 should do the trick to show the error hope this could be sufficient... @tooolbox

josejuanmontiel commented 2 years ago

Hi @tooolbox now with lint errors in #255 resolved ... happy new year ;)

tooolbox commented 2 years ago

Hey @josejuanmontiel I think you fixed the linting in your last PR, if you want to update this one?

josejuanmontiel commented 2 years ago

Hi @tooolbox this PR (#245 it's owned by @n10ty) the logs to review the link has expired... in #255 there are only the test that now fails ... and later (with #245) works... i had checked in my repository in this branch https://github.com/josejuanmontiel/kala/commits/feature/prevent%2Btest where i cherry-pick the commits of this PR and the test of #255 and lint passed then https://github.com/josejuanmontiel/kala/actions/runs/1613584482 if you want.. and @n10ty don't mind i can increment #255 with #245... but this should pass the lint (sure the error was related with #249 .

tooolbox commented 1 year ago

@josejuanmontiel

Sorry for the delay.

Sure, sounds good.

josejuanmontiel commented 1 year ago

@tooolbox take a look if you have time... this new pr as we said

Need extra work to be notice:

Hope this finally make this done.

I offer (hope to have some free time for this) to help in the rest of PR

Kala its a good colection of good code, that need some care to be up-today

tooolbox commented 1 year ago

Handled by #257