Closed doluk closed 2 days ago
After some runtime some of the schedules are simply gone, the task itself had no errors and ensures in theory always a rescheduling. Meaning the number of schedules drops constantly.
Schedules are supposed to be deleted when they've run their course. Is this what you're seeing? Without a reproducing example I can't say much more.
Yes sure, in my case they should add a new schedule with the same name at the end of the task. I referred to the non existence of the new schedules
I was able to reproduce this bug also with 4.0.0a3. Posting my example now. Just a quick comment: if you use await scheduler.add_job(create_schedules)
you get directly the error with the traceback above. Using await create_schedules()
instead raises no exception and you dont see anything in the apscheduler logs, but the other logs (bug_demo.log
) show that the number of schedules is decreasing over time.
Could you try with v4.0.0a4?
I upgraded apscheduler, deleted the old tables in the database and tried to use the same code as before, but getting now this error:
2023-11-13 08:17:24,598:ERROR-bug_demo:hon3.10/site-packages/apscheduler/_schedulers/async_.py", line 737, in _process_schedules
async_ - _process_schedules - 739:
Error computing next fire time for schedule 'check_self' of task '__main__:check_self' – removing schedulen next
self._last_fire_time += self._interval
Traceback (most recent call last):r (not "datetime.timedelta") to str
20File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 737, in _process_schedules
bug_fire_time = calculate_next()
20File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/triggers/interval.py", line 67, in next
jobsself._last_fire_time += self._intervalules: 0, regular_check jobs: 0
TypeError: can only concatenate str (not "datetime.timedelta") to str
sorry my phone messes with the formating in the terminal, i can get you a properly formatted one later
That looks like there are strings leaking to where only datetimes should go. Can you provide a simpler way to reproduce the problem that you're seeing?
The error is from the Interval trigger, just create a schedule with an interval trigger. It executes ones and then creates this error
It works with the memory store, but with the sqlalchemy store I'm getting a bizarre (but different) error about the function reference which I'm investigating.
That was a false alarm, it was caused by a leftover schedule from another run. When I cleared the database, it ran perfectly. Is the PostgreSQL standalone example not working for you?
I created a whole new database and run the standalone example, which works fine for me!
I then cleared the whole database and used my code again. Still getting the same error. So I started slowly changing the code of the postgresql standalone example towards my code. And with the addition of the JSON serializer it breaks and starts throwing this error TypeError: can only concatenate str (not "datetime.timedelta") to str
. I looked a bit into the code and found that if I change interval.py:45 to _last_fire_time: datetime | None = attrs.field(init=False, eq=False, default=None, converter=as_aware_datetime)
it works without the previously mentioned TypeError. I tested with this change the Cbor and Pickle serializer as well, which work fine for both.
Ok, I missed your use of JSONSerializer there. I've actually just started a discussion on this, and would like to know your reasons for a serializer other than pickle
.
Replied there as well. For me it was more debugging, to understand what arguments are going in. I checked my initial report by the way with the pickle serializer and the problem still persists.
And I think I found the problem causing this bug of the disappearing schedules.
My example code to reproduce it creates 2000 times the same schedules and numerating them. It first works find but then stops working after around 20 minutes. Every schedule schedule another schedule with the same id regular_check-{number}
during the execution. I enabled the logging of apscheduler and counted the invokes and the job completed. Those were always 2000 for every round of schedules. To further understand what is happening I modifed the code of the schedule regular_check. I try to get the old_schedule, which is currently excuted and the new one after it got scheduled. To distinguish them, I added a counter to the id, counting the executions. Now there are not longer any schedules disappearing (meaning the count of schedules is 2000 regular_check plus 1 the self_check). Curious by that I looked into the logs. It appears that sometimes during the execution the schedule is still present and receivable with scheduler.get_schedule()
and sometimes not. I suspect that the newly scheduled schedule is somehow getting deleted at the end of the job execution, because they have the same id.
async def regular_check(check_id: str, scheduler: Optional[ExtendedAsyncScheduler] = None):
if not scheduler:
try:
scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
except LookupError as e:
raise e
check_id = check_id.replace("regular_check-", "")
[check_no, iteration] = check_id.split("+")
iteration = int(iteration)
try:
old_schedule = await scheduler.get_schedule(f"regular_check-{check_id}")
except ScheduleLookupError:
old_schedule = None
try:
run_date = datetime.now(tz=timezone.utc) + timedelta(minutes=5)
await scheduler.add_schedule(regular_check,
DateTrigger(run_time=run_date),
id=f"regular_check-{check_no}+{iteration + 1}",
misfire_grace_time=300,
conflict_policy=apscheduler.ConflictPolicy.exception,
args=(f"{check_no}+{iteration + 1}",),
max_jitter=30)
try:
new_schedule = await scheduler.get_schedule(f"regular_check-{check_no}+{iteration + 1}")
except ScheduleLookupError:
new_schedule = None
scheduler.logger.info(f"Old schedule: {old_schedule=}\nNew schedule: {new_schedule=}" +
f"\nAdded regular check {check_id} {iteration+1} at {run_date}")
except Exception as e:
scheduler.logger.error(f"Error in regular check {check_id}\n{traceback.format_exc()}")
async def create_schedules(scheduler: Optional[ExtendedAsyncScheduler] = None):
if not scheduler:
try:
scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
except LookupError as e:
raise e
db: asyncpg.Pool = scheduler.db
try:
async with db.acquire() as conn:
for i in range(2000):
x = str(i)
while len(x) < 4:
x = "0" + x
try:
schedule = await conn.fetchrow("SELECT * from jsonserializer.schedules where id ilike $1",
f"regular_check-{x}%")
except Exception as e:
traceback.print_exc()
raise e
if not schedule:
await scheduler.add_job(regular_check, args=(f'{x}+0',))
scheduler.logger.error("Finished creating schedules")
except Exception as e:
scheduler.logger.error(f"Error in create_schedules\n{traceback.format_exc()}")
Schedules are always deleted when their triggers don't produce any more fire times, that is normal. Is that what you mean?
Schedule 1 with a datetrigger is executed, during the execution a new schedule 2 with the same id is created. Sometimes during the execution of the job of schedule 1, the schedule 1 is still accessible by the id, sometimes not. If it is present, somehow the schedule 2 disappears after the job of schedule 1 finished.
By using real unique ids this can be avoided. Sadly the part with the KeyError still happens
Sometimes during the execution of the job of schedule 1, the schedule 1 is still accessible by the id, sometimes not
Is your expectation that the schedule is available as long as there are any jobs active associated with it?
As long as it is consistent I am okay with either one. I would prefer it not being available unless it produces another runtime. Because the schedule exhausted and therefore the schedule id should be free to use again.
That sounds reasonable. The reason why it works the way it does now is that _process_schedules()
creates the jobs, and only after creating the necessary jobs does it remove any finished schedules.
This out of the way, the initial reported error still occurs 🙈 During that I also stumbled across another thing: How does apscheduler handle many pending jobs in combination with jobs from schedules? When I add a lot of jobs before the scheduler is started and have a schedule with an interval trigger, I noticed that after starting up the interval schedule/jobs are not executed in that interval. My expectation would be here, that jobs from schedules have priority compared to jobs which are not connected to a schedule
This out of the way, the initial reported error still occurs
Can you clarify what you mean?
My expectation would be here, that jobs from schedules have priority compared to jobs which are not connected to a schedule
The schedulers fetch jobs in the FIFO order, by created_at
. There is no other prioritization occurring.
the initial error was that a KeyError removing job.id from self._running_jobs
the initial error was that a KeyError removing job.id from self._running_jobs
I "fixed" that by doing self._running_jobs.discard(job.id)
instead so it doesn't fail if the key is missing. Not a real fix but at least it's not crashing anymore.
Can you reproduce this with the latest master
? I just ran that originally posted script for half an hour without seeing any errors. The only console output I got was starting scheduler
, and then some logging present in bug_demo.log
, nothing else.
Yes, will have it running over the next few hours
I get no error messages, but the number of jobs is still decaying. After 15 minutes the number of regular_checks droped from 2000 to 911.
So let me just make sure we're on the same page here. Why are you replacing the schedule from a job belonging to that schedule?
I have an api need to call for a list of identifier over and over again. The time of the next call is dependent on the api response for the identifier. The function making the api calls is regular_check
in the example above. Currently the schedule id is regular_check_{identifier}
. and the check_id
of the example is also the identifier.
With APScheduler v3 I used and sqlite jobstore and never had any issues with this way of doing it. I can see that using the same id again causes issues, but I would argue that the caused behavior is not intuitive (especially considering the used conflictpolicy)
If you're using DateTrigger
, then would it not be possible to create a totally new schedule from the job?
Isn't that what I am doing in regular_checks?
Ok, I think I didn't read everything thoroughly, so the later snippet is your solution/workaround for the original problem, correct? That would indeed work. The scheduler deletes a schedule when its trigger can no longer generate any new fire times, and this may happen while the job that was spawned from said schedule is still running. I'm wondering if perhaps it shouldn't be? Perhaps instead finished schedules should be cleaned up by the data store cleanup procedure instead which ensures that only schedules with no running jobs are deleted.
Not really solving, because the schedules still get somehow lost. It seems to be related with the number of schedules/jobs. If you keep them below 800 or so they stay stable for hours. So I assume the issue is caused by some kind of bad timing between cleanups, new schedules being stored and the duration of a job.
While I agree it is somewhat silly and fixable at my side, this behavior is not predictable and shouldn't happen.
Beside the solution you suggested, I can think of another one. The check if the Trigger is not producing another schedule should be rechecked/the trigger should be compared before removal of the schedule, to prevent exactly this
And yes the later snippet with the added incremental counter fixes it because then the change in the trigger is also a change in the schedule
I forgot that this issue should already be fixed in v4.0.0a5. Please let me know either way when you test it.
Still occuring
Hello, get same crashes at random time. Its the scheduler part:
async with GLOBAL_VARS.SCHEDULER as scheduler:
await scheduler.add_schedule(DataUpdater.update_faceit_users, IntervalTrigger(days=1), misfire_grace_time=3600, id='faceit_users_parse', max_running_jobs=1)
await scheduler.add_schedule(DataUpdater.update_lol_matches, IntervalTrigger(minutes=1), misfire_grace_time=15, id='lol_matches_parse', max_running_jobs=1)
await scheduler.add_schedule(DataUpdater.check_not_announced_matches, IntervalTrigger(seconds=30), misfire_grace_time=15, id='check_not_announced_matches', max_running_jobs=1)
await scheduler.add_schedule(DataUpdater.update_faceit_matches, IntervalTrigger(minutes=5), misfire_grace_time=180, id='faceit_matches_parse', max_running_jobs=1)
await scheduler.add_schedule(DISCORD.check_connect, IntervalTrigger(minutes=10), misfire_grace_time=30, id='ds_hearthbeat', max_running_jobs=1)
logging.info('Program has successfully started [DONE]')
await scheduler.start_in_background()
while True:
await asyncio.sleep(3600)
There is the only addition interaction with shcheduler in code:
await GLOBAL_VARS.SCHEDULER.add_job(DataUpdater.update_faceit_match_by_id, kwargs={"match_id": match_id, "player_ids": list(player_ids)})
And sometimes scheduler just crashes with long traceback, but the final part is the same:
self._running_jobs.remove(job.id)
| KeyError: UUID(<HERE RANDOM ID>)
[2024-05-19 06:43:44,559] - async_.py - Job f3805d3f-db30-4383-a695-b0d56cf46d02 completed successfully
[2024-05-19 06:43:44,561] - async_.py - Job fc5999df-a342-4a1e-a323-a5084efa52e7 was cancelled
[2024-05-19 06:43:44,563] - async_.py - Scheduler crashed
+ Exception Group Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 784, in run_until_stopped
| async with create_task_group() as task_group:
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1018, in _process_jobs
| await wakeup_event.wait()
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1651, in wait
| await self._event.wait()
| File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
| await fut
| asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f28f815b610
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 988, in _process_jobs
| async with AsyncExitStack() as exit_stack:
| File "/usr/lib/python3.10/contextlib.py", line 714, in __aexit__
| raise exc_details[1]
| File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
| await self.data_store.release_job(self.identity, job, result)
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/datastores/memory.py", line 329, in release_job
| job_state = self._jobs_by_id.pop(result.job_id)
| KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
+------------------------------------
[2024-05-19 06:43:44,573] - main.py - + Exception Group Traceback (most recent call last):
| File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Exception Group Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 784, in run_until_stopped
| async with create_task_group() as task_group:
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Exception Group Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 988, in _process_jobs
| async with AsyncExitStack() as exit_stack:
| File "/usr/lib/python3.10/contextlib.py", line 714, in __aexit__
| raise exc_details[1]
| File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
+------------------------------------
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/root/tusa_daunov/main.py", line 55, in main
| async with GLOBAL_VARS.SCHEDULER as scheduler:
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 178, in __aexit__
| await self._exit_stack.__aexit__(exc_type, exc_val, exc_tb)
| File "/usr/lib/python3.10/contextlib.py", line 714, in __aexit__
| raise exc_details[1]
| File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/root/tusa_daunov/main.py", line 75, in main
| await asyncio.sleep(3600)
| File "/usr/lib/python3.10/asyncio/tasks.py", line 605, in sleep
| return await future
| asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f28f92ef670
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
| File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Exception Group Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 784, in run_until_stopped
| async with create_task_group() as task_group:
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1018, in _process_jobs
| await wakeup_event.wait()
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1651, in wait
| await self._event.wait()
| File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
| await fut
| asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f28f815b610
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 988, in _process_jobs
| async with AsyncExitStack() as exit_stack:
| File "/usr/lib/python3.10/contextlib.py", line 714, in __aexit__
| raise exc_details[1]
| File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
| await self.data_store.release_job(self.identity, job, result)
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/datastores/memory.py", line 329, in release_job
| job_state = self._jobs_by_id.pop(result.job_id)
| KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
It would be helpful to have a minimum workable example. All the snippets in this issue are either unnecessarily complex or incomplete.
One thing to consider is the run time of the jobs. If the jobs run for longer than 15 minutes, they could trigger a known bug which is present even in v4.0.0a5.
@doluk When you say it's still happening, do you mean the original issue (with all those exceptions) or something else, namely the number of schedules being wrong?
@Soulbadguy54 Could you produce a minimum workable example I could run to reproduce the issue using just the memory data store?
It is happening that schedules go missing, sorry worded it badly
So there are two different issues here: schedules going missing, and the KeyError about job IDs. I'm not sure yet that they're related.
Its can be related, since the scheduler tries to delete missing job or something like this ...
But if the scheduler tries to delete jobs that aren't there, there would be errors like the ones above. I'm not seeing any, however, when I try to reproduce the issue with your script.
So my example runs eith a postgresql database (version 16) without any decaying number of schedules? How many schedules do you let it create? For smaller numbers below 2000, it takes way longer to occur/never does. If I let it create 6000 I can see them drop within 15 minutes
As I recall, I saw a decaying number of schedules the last time I ran it, but I didn't see any KeyError
s as in your original report.
yes no keyerrors anymore
apscheduler==4.0.0a5 . Same error here:
INFO:apscheduler._schedulers.sync:Job b6e86506-c57c-492b-be92-5a1b95329563 completed successfully
ERROR:apscheduler._schedulers.sync:Scheduler crashed
+ Exception Group Traceback (most recent call last):
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 808, in run_until_stopped
| await self.event_broker.publish_local(SchedulerStarted())
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1018, in _process_jobs
| await wakeup_event.wait()
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 1662, in wait
| await self._event.wait()
| File "/usr/lib/python3.8/asyncio/locks.py", line 309, in wait
| await fut
| asyncio.exceptions.CancelledError
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1019, in _process_jobs
| wakeup_event = anyio.Event()
| File "/usr/lib/python3.8/contextlib.py", line 679, in __aexit__
| raise exc_details[1]
| File "/usr/lib/python3.8/contextlib.py", line 662, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (2 sub-exceptions)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
| await self.data_store.release_job(self.identity, job, result)
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/datastores/memory.py", line 329, in release_job
| job_state = self._jobs_by_id.pop(result.job_id)
| KeyError: UUID('08d6d19f-47b8-4ef6-aae3-efbb3bc90234')
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('08d6d19f-47b8-4ef6-aae3-efbb3bc90234')
+---------------- 2 ----------------
| Traceback (most recent call last):
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
| await self.data_store.release_job(self.identity, job, result)
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/datastores/memory.py", line 329, in release_job
| job_state = self._jobs_by_id.pop(result.job_id)
| KeyError: UUID('b6e86506-c57c-492b-be92-5a1b95329563')
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('b6e86506-c57c-492b-be92-5a1b95329563')
+------------------------------------
My scheduler is simple:
def get_trigger(entry):
return CronTrigger(minute=entry["minute"], hour=entry["hour"], day=entry["day"],
month=entry["month"], day_of_week=entry["day_of_week"],)
scheduler = Scheduler()
for plugin in load_plugins("./plugins"):
...
scheduler.add_schedule(plugin_runner_function, get_trigger(plugin.schedule), args=[plugin.args])
I get the same KeyError error.
If it's important to know, I'm using: apscheduler 4.0.0a5, Mongodb datastore, python3.9.
I see that there are already several other code examples above where the same error is occurring. Because of this, I will not post example code in this post.
Error Log:
Scheduler crashed
+ Exception Group Traceback (most recent call last):
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 808, in run_until_stopped
| await self.event_broker.publish_local(SchedulerStarted())
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1009, in _process_jobs
| jobs = await self.data_store.acquire_jobs(self.identity, limit)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 542, in acquire_jobs
| async for attempt in self._retry():
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
| do = await self.iter(retry_state=self._retry_state)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
| result = await action(retry_state)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/_utils.py", line 99, in inner
| return call(*args, **kwargs)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/__init__.py", line 392, in <lambda>
| self._add_action_func(lambda rs: rs.outcome.result())
| File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
| return self.__get_result()
| File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
| raise self._exception
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 545, in acquire_jobs
| async with await AsyncCursor.create(
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 126, in create
| cursor = await to_thread.run_sync(func)
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
| return await get_async_backend().run_sync_in_worker_thread(
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2128, in run_sync_in_worker_thread
| await cls.checkpoint()
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2050, in checkpoint
| await sleep(0)
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 641, in sleep
| await __sleep0()
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 630, in __sleep0
| yield
| asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ffa443cb880
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
| result = coro.send(None)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1019, in _process_jobs
| wakeup_event = anyio.Event()
| File "/usr/local/lib/python3.9/contextlib.py", line 670, in __aexit__
| raise exc_details[1]
| File "/usr/local/lib/python3.9/contextlib.py", line 653, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (4 sub-exceptions)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
| result = coro.send(None)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('eef5e05b-03b9-4ec5-9f1a-747847a4bdde')
+---------------- 2 ----------------
| Traceback (most recent call last):
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
| await self.data_store.release_job(self.identity, job, result)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 630, in release_job
| async for attempt in self._retry():
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
| do = await self.iter(retry_state=self._retry_state)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
| result = await action(retry_state)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/_utils.py", line 99, in inner
| return call(*args, **kwargs)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/__init__.py", line 392, in <lambda>
| self._add_action_func(lambda rs: rs.outcome.result())
| File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
| return self.__get_result()
| File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
| raise self._exception
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 640, in release_job
| await to_thread.run_sync(
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
| return await get_async_backend().run_sync_in_worker_thread(
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2128, in run_sync_in_worker_thread
| await cls.checkpoint()
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2050, in checkpoint
| await sleep(0)
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 641, in sleep
| await __sleep0()
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 630, in __sleep0
| yield
| asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ffa443cb880
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 258, in __step
| result = coro.throw(exc)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('eef5e05b-03b9-4ec5-9f1a-747847a4bdde')
+---------------- 3 ----------------
| Traceback (most recent call last):
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
| await self.data_store.release_job(self.identity, job, result)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 630, in release_job
| async for attempt in self._retry():
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
| do = await self.iter(retry_state=self._retry_state)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
| result = await action(retry_state)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/_utils.py", line 99, in inner
| return call(*args, **kwargs)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/__init__.py", line 392, in <lambda>
| self._add_action_func(lambda rs: rs.outcome.result())
| File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
| return self.__get_result()
| File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
| raise self._exception
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 640, in release_job
| await to_thread.run_sync(
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
| return await get_async_backend().run_sync_in_worker_thread(
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2128, in run_sync_in_worker_thread
| await cls.checkpoint()
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2050, in checkpoint
| await sleep(0)
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 641, in sleep
| await __sleep0()
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 630, in __sleep0
| yield
| asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ffa443cb880
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 258, in __step
| result = coro.throw(exc)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('658dd562-7268-428f-84a4-d9387f3d31db')
+---------------- 4 ----------------
| Traceback (most recent call last):
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
| await self.data_store.release_job(self.identity, job, result)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 630, in release_job
| async for attempt in self._retry():
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
| do = await self.iter(retry_state=self._retry_state)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
| result = await action(retry_state)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/_utils.py", line 99, in inner
| return call(*args, **kwargs)
| File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/__init__.py", line 392, in <lambda>
| self._add_action_func(lambda rs: rs.outcome.result())
| File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
| return self.__get_result()
| File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
| raise self._exception
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 640, in release_job
| await to_thread.run_sync(
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
| return await get_async_backend().run_sync_in_worker_thread(
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2128, in run_sync_in_worker_thread
| await cls.checkpoint()
| File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2050, in checkpoint
| await sleep(0)
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 641, in sleep
| await __sleep0()
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 630, in __sleep0
| yield
| asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ffa443cb880
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/usr/local/lib/python3.9/asyncio/tasks.py", line 258, in __step
| result = coro.throw(exc)
| File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
| self._running_jobs.remove(job.id)
| KeyError: UUID('658dd562-7268-428f-84a4-d9387f3d31db')
+------------------------------------
I believe this has been fixed in master now. Let me know if it happens again.
Things to check first
[X] I have checked that my issue does not already have a solution in the FAQ
[X] I have searched the existing issues and didn't find my bug already reported there
[X] I have checked that my bug is still present in the latest release
Version
4.0.0a2.post72
What happened?
I know it is an alpha etc. But I wanted to report it, because I didn't observe it with lower number of jobs: I use the apscheduler with a postgresql jobstore and have around 5000 schedules, but the same task, just different arguments. They are scheduled with a DateTrigger and schedule themself at the end of the task for the next execution.
After some runtime some of the schedules are simply gone, the task itself had no errors and ensures in theory always a rescheduling. Meaning the number of schedules drops constantly. I observed this behavior with the thread executor but also the normal async executor. With the thread executor often the complete scheduler would crash occasionally, With the async one at least the scheduler stays alive. But both times the jobs table has an enourmous number of rows. Right now I have around 100k rows in there (I can provide a dump if needed). Also the logs shows the following error multiple times (same error just another scope valueand UUID):
How can we reproduce the bug?