Open HK-Mattew opened 4 months ago
I see two problems in the code:
stop()
method is a bit too forceful (by default), and should probably not cancel the scheduler's task group
- The
stop()
method is a bit too forceful (by default), and should probably not cancel the scheduler's task group
However, the expected behavior when the job is completed normally is:
However, in my case, I used Scheduler.stop before the job was completed. Then the Scheduler showed me the log Job ...id... completed successfully, so the job was completed. However, the job was not deleted from the datastore and the task did not have its running_jobs decrement.
My question is, what is causing the scheduler not to delete the job at the end of its execution and also why did it not decrement the task's running_jobs field?
2. The job acquisition operation isn't exactly atomic on MongoDB (I don't know how to actually accomplish that given its nature), but a cancellation will abort the operation halfway through.
In pymongo you can use Transactions within a session. But I don't think it's necessary in this case, because the task was not canceled or got an error. The job was completed successfully.
If I'm leaving anything unnoticed please let me know.
In pymongo you can use Transactions within a session. But I don't think it's necessary in this case, because the task was not canceled or got an error. The job was completed successfully.
Transactions don't work on a single node Mongo server.
My question is, what is causing the scheduler not to delete the job at the end of its execution and also why did it not decrement the task's running_jobs field?
Because stop()
currently cancels all the task groups within the scheduler, and there is no shielding to prevent the job release operation from being cancelled.
In pymongo you can use Transactions within a session. But I don't think it's necessary in this case, because the task was not canceled or got an error. The job was completed successfully.
Transactions don't work on a single node Mongo server.
You are right. So that would be the difficulty because not everyone uses MongoDB with more than one node :/
I use MongoDB with more than one node. So I think I would have to try to adapt something like MongoDBDataStore(allow_transaction=True)
My question is, what is causing the scheduler not to delete the job at the end of its execution and also why did it not decrement the task's running_jobs field?
Because
stop()
currently cancels all the task groups within the scheduler, and there is no shielding to prevent the job release operation from being cancelled.
I understood
In pymongo you can use Transactions within a session. But I don't think it's necessary in this case, because the task was not canceled or got an error. The job was completed successfully.
Transactions don't work on a single node Mongo server.
You are right. So that would be the difficulty because not everyone uses MongoDB with more than one node :/
I use MongoDB with more than one node. So I think I would have to try to adapt something like MongoDBDataStore(allow_transaction=True)
How would that help users with just one node?
allow_transaction
Unfortunately this would still be a problem. Since transactions would only work for those who used allow_transaction=True
I found a temporary solution to the problem I'm facing. Just save the jobs that were running when the scheduler stopped and use the .release_job method manually, like this:
asyncio.run(
scheduler.data_store.release_job(
...
)
)
This worked correctly even after the scheduler stopped.
I found a temporary solution to the problem I'm facing. Just save the jobs that were running when the scheduler stopped and use the .release_job method manually, like this:
asyncio.run( scheduler.data_store.release_job( ... ) )
This worked correctly even after the scheduler stopped.
This is a dangerous looking "fix". You should be aware that I'm currently in the process of refactoring the stop()
method to allow the scheduler to shut down more gracefully, allowing jobs to complete properly if they do so within the allotted time. I'm also considering shielding the release operations from CancelScope
cancellation if that looks like it makes sense.
This is a dangerous looking "fix". You should be aware that I'm currently in the process of refactoring the
stop()
method to allow the scheduler to shut down more gracefully, allowing jobs to complete properly if they do so within the allotted time. I'm also considering shielding the release operations fromCancelScope
cancellation if that looks like it makes sense.
It's not really the best solution. But it would help temporarily.
Your idea about CancelScope sounds good. I hope it works well :)
[Another idea] One idea I had would be to work with signals in the Scheduler.
Example:
class Scheduler():
...
scheduler = Scheduler()
scheduler.send_signal('stop running new jobs')
"""
I wait until no jobs are running in the scheduler and then use the scheduler.stop() method.
With this, the scheduler would be able to process the job deletion operations after they are executed,
and also decrement the task running_jobs field.
"""
assert len(scheduler._async_scheduler._running_jobs) == 0
scheduler.stop()
This seems like a good solution to the current problem.
The stop()
method already sets the scheduler state to stopping
which signals to the background tasks that they should exit their respective loops. Unfortunately, currently there are background tasks which sleep for certain periods of time, and I have to find a way to safely interrupt these tasks in order to allow their task groups to exit.
The
stop()
method already sets the scheduler state tostopping
which signals to the background tasks that they should exit their respective loops. Unfortunately, currently there are background tasks which sleep for certain periods of time, and I have to find a way to safely interrupt these tasks in order to allow their task groups to exit.
I just checked and, indeed, the AsyncScheduler._process_jobs method has a condition to only run with RunState.started.
So, I think the best bet would be your idea about CancelScope.
Man, I'd like to take this opportunity to thank you for your great work. This new version of APScheduler is looking amazing. I really like it. 😉
So, I think the best bet would be your idea about CancelScope.
I'm not sure we're on the same page here. I brought up CancelScope
because those could be used to shield certain sensitive operations (like releasing a job) from cancellation.
Man, I'd like to take this opportunity to thank you for your great work. This new version of APScheduler is looking amazing. I really like it. 😉
Thanks! Always nice to see one's work appreciated!
Things to check first
[X] I have checked that my issue does not already have a solution in the FAQ
[X] I have searched the existing issues and didn't find my bug already reported there
[X] I have checked that my bug is still present in the latest release
Version
4.0.0a5
What happened?
Hello,
I would like to report a bug that occurs after stopping the Scheduler. From what I have noticed, the bug occurs when I stop the scheduler while there is a job running.
Summary: While a job is still running, I use the scheduler's .stop method. I wait until the scheduler.state is in the stopped state. I see that when it reaches the scheduler.state.stopped state, the job that was running before I stopped the scheduler is completed successfully. However, some operations remain pending in the DataStore, such as decreasing the running_jobs of the task document, and the job is not deleted from the job collection.
Tested only with MongoDBDataStore
How can we reproduce the bug?
Code to replicate the bug:
My logs running the sample code: