agronholm / apscheduler

Task scheduling library for Python
MIT License
6k stars 693 forks source link

APScheduler 4.0.0 Bug #803

Closed doluk closed 2 days ago

doluk commented 9 months ago

Things to check first

Version

4.0.0a2.post72

What happened?

I know it is an alpha etc. But I wanted to report it, because I didn't observe it with lower number of jobs: I use the apscheduler with a postgresql jobstore and have around 5000 schedules, but the same task, just different arguments. They are scheduled with a DateTrigger and schedule themself at the end of the task for the next execution.

After some runtime some of the schedules are simply gone, the task itself had no errors and ensures in theory always a rescheduling. Meaning the number of schedules drops constantly. I observed this behavior with the thread executor but also the normal async executor. With the thread executor often the complete scheduler would crash occasionally, With the async one at least the scheduler stays alive. But both times the jobs table has an enourmous number of rows. Right now I have around 100k rows in there (I can provide a dump if needed). Also the logs shows the following error multiple times (same error just another scope valueand UUID):

  + Exception Group Traceback (most recent call last):
  |   File "/root/ccn_tracker/scheduler_main.py", line 90, in main
  |     await scheduler.run_until_stopped()
  |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 514, in run_until_stopped
  |     async with create_task_group() as task_group:
  |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 664, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/anyio/streams/memory.py", line 217, in send
    |     self.send_nowait(item)
    |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/anyio/streams/memory.py", line 200, in send_nowait
    |     raise WouldBlock
    | anyio.WouldBlock
    | 
    | During handling of the above exception, another exception occurred:
    | 
    | Traceback (most recent call last):
    |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 700, in _process_jobs
    |     jobs = await self.data_store.acquire_jobs(self.identity, limit)
    |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/datastores/sqlalchemy.py", line 759, in acquire_jobs
    |     await self._event_broker.publish(
    |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/eventbrokers/asyncpg.py", line 179, in publish
    |     await self._send.send(notification)
    |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/anyio/streams/memory.py", line 223, in send
    |     await send_event.wait()
    |   File "/usr/local/lib/python3.10/asyncio/locks.py", line 214, in wait
    |     await fut
    |   File "/usr/local/lib/python3.10/asyncio/futures.py", line 285, in __await__
    |     yield self  # This tells Task to wait for completion.
    |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
    |     future.result()
    |   File "/usr/local/lib/python3.10/asyncio/futures.py", line 196, in result
    |     raise exc
    | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f6a84b10610
    | 
    | During handling of the above exception, another exception occurred:
    | 
    | Exception Group Traceback (most recent call last):
    |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
    |     result = coro.send(None)
    |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 683, in _process_jobs
    |     async with AsyncExitStack() as exit_stack:
    |   File "/usr/local/lib/python3.10/contextlib.py", line 714, in __aexit__
    |     raise exc_details[1]
    |   File "/usr/local/lib/python3.10/contextlib.py", line 697, in __aexit__
    |     cb_suppress = await cb(*exc_details)
    |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 664, in __aexit__
    |     raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (82 sub-exceptions)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 2 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 3 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 4 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 5 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 6 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 7 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 8 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 9 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 10 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 11 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 12 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 13 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 14 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- 15 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
      |     result = coro.send(None)
      |   File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/schedulers/async_.py", line 785, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('7baf47d5-46f3-4c56-b4ca-9fcdc4189250')
      +---------------- ... ----------------
      | and 67 more exceptions
      +------------------------------------

How can we reproduce the bug?

from __future__ import annotations

import asyncio
import asyncpg
import sys
import logging
import traceback
from datetime import datetime, timedelta, timezone
from logging import handlers
from typing import Optional

if sys.version_info >= (3, 11):
    from typing import Self
else:
    from typing_extensions import Self

import apscheduler
from apscheduler._schedulers.async_ import AsyncScheduler
from apscheduler.datastores.sqlalchemy import SQLAlchemyDataStore
from apscheduler.serializers.json import JSONSerializer
from apscheduler.triggers.date import DateTrigger
from apscheduler.triggers.interval import IntervalTrigger
from sqlalchemy.ext.asyncio import create_async_engine

# logging
logger = logging.getLogger('bug_demo')
file_handler = handlers.TimedRotatingFileHandler(filename=f"bug_demo.log",
                                                 when='MIDNIGHT', encoding='utf-8',
                                                 backupCount=0, utc=False)
file_handler.setFormatter(logging.Formatter('%(asctime)s:%(levelname)s-%(name)s:\n'
                                            '%(module)s - %(funcName)s - %(lineno)d:\n'
                                            '%(message)s\n\n'))
file_handler.namer = lambda name: name.replace(".log.", "_") + ".log"
logger.addHandler(file_handler)
logger.setLevel(logging.ERROR)

# dedicated logger for apscheduler
logger2 = logging.getLogger("apscheduler")
file_handler2 = handlers.TimedRotatingFileHandler(filename=f"bug_demo_appscheduler.log",
                                                 when='MIDNIGHT', encoding='utf-8',
                                                 backupCount=0, utc=False)
file_handler2.setFormatter(logging.Formatter('%(asctime)s:%(levelname)s-%(name)s:\n'
                                            '%(module)s - %(funcName)s - %(lineno)d:\n'
                                            '%(message)s\n\n'))
file_handler2.namer = lambda name: name.replace(".log.", "_") + ".log"
logger2.addHandler(file_handler2)
logger2.setLevel(logging.INFO)

cfg = {'user'    : 'ENTER USER HERE',
       'password': 'ENTER PASSWORD HERE',
       'host'    : 'ENTER HOST HERE',
       'database': 'ENTER DATABASE HERE',
       'port': 'ENTER PORT HERE'
       }

class ExtendedAsyncScheduler(AsyncScheduler):
    async def __aenter__(self: Self) -> Self:
        await super().__aenter__()
        self.db = await asyncpg.create_pool(**cfg)
        self.logger = logger
        return self

async def regular_check(check_id: str, scheduler: Optional[ExtendedAsyncScheduler] = None):
    if not scheduler:
        try:
            scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
        except LookupError as e:
            raise e
    try:
        run_date = datetime.now(tz=timezone.utc) + timedelta(minutes=5)
        await scheduler.add_schedule(regular_check,
                                     DateTrigger(run_time=run_date),
                                     id=f"regular_check-{check_id}",
                                     misfire_grace_time=300,
                                     conflict_policy=apscheduler.ConflictPolicy.replace,
                                     args=(check_id,),
                                     max_jitter=30)
        logger.info(f"Added regular check {check_id} at {run_date}")
    except Exception as e:
        logger.error(f"Error in regular check {check_id}\n{traceback.format_exc()}")

async def check_self(scheduler: Optional[ExtendedAsyncScheduler] = None):
    if not scheduler:
        try:
            scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
        except LookupError as e:
            raise e
    db: asyncpg.Pool = scheduler.db
    try:
        async with db.acquire() as conn:
            jobs = await conn.fetchrow("SELECT count(*) as total, count(*) filter ( where jobs.task_id is not null and "
                                     "task_id ilike '__main__:regular_check' ) as reg "
                                     "from apscheduler.jobs")
            if not jobs:
                jobs = (0, 0)
            schedules = await conn.fetchrow("SELECT count(*) as total, count(*) filter ( where "
                                     "id ilike 'regular_check%' ) as req from apscheduler.schedules")
            if not schedules:
                schedules = (0, 0)
        now = datetime.utcnow()
        scheduler.logger.error(f"{now}\njobs: {jobs[0]}, schedules: {schedules[0]}, "
                               f"regular_check schedules: {schedules[1]}, regular_check jobs: {jobs[1]}\n"
                               f"regular_check jobs+schedules: {schedules[1] + jobs[1]}")
    except Exception as e:
        logger.error(f"Error in check_self\n{traceback.format_exc()}")

async def create_schedules(scheduler: Optional[ExtendedAsyncScheduler] = None):
    if not scheduler:
        try:
            scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
        except LookupError as e:
            raise e
    db: asyncpg.Pool = scheduler.db
    try:
        async with db.acquire() as conn:
            for i in range(2000):
                try:
                    schedule = await conn.fetchrow("SELECT * from apscheduler.schedules where id ilike $1",
                                                  f"regular_check-{i}")
                except Exception as e:
                    traceback.print_exc()
                    raise e
                if not schedule:
                    await scheduler.add_job(regular_check, args=(f'{i}',))
        logger.error("Finished creating schedules")
    except Exception as e:
        logger.error(f"Error in create_schedules\n{traceback.format_exc()}")

async def main():
    engine = create_async_engine(
            f"postgresql+asyncpg://{cfg['user']}:{cfg['password']}@{cfg['host']}/{cfg['database']}")
    serializer = JSONSerializer()
    data_store = SQLAlchemyDataStore(engine, serializer=serializer, schema='apscheduler')
    async with ExtendedAsyncScheduler(data_store=data_store,
                                      role=apscheduler.SchedulerRole.both) as scheduler:
        await check_self(scheduler)
        await scheduler.add_schedule(check_self,
                                     IntervalTrigger(minutes=1),
                                     id="check_self",
                                     misfire_grace_time=120)
        # await scheduler.add_job(create_schedules) # creates bug directly before scheduler even starts
        await create_schedules(scheduler) # works fine for the first few minutes and then the bug occurs again
        await check_self(scheduler)
        print('starting scheduler')
        await scheduler.run_until_stopped()

if __name__ == '__main__':
    asyncio.run(main())
agronholm commented 9 months ago

After some runtime some of the schedules are simply gone, the task itself had no errors and ensures in theory always a rescheduling. Meaning the number of schedules drops constantly.

Schedules are supposed to be deleted when they've run their course. Is this what you're seeing? Without a reproducing example I can't say much more.

doluk commented 9 months ago

Yes sure, in my case they should add a new schedule with the same name at the end of the task. I referred to the non existence of the new schedules

doluk commented 9 months ago

I was able to reproduce this bug also with 4.0.0a3. Posting my example now. Just a quick comment: if you use await scheduler.add_job(create_schedules) you get directly the error with the traceback above. Using await create_schedules() instead raises no exception and you dont see anything in the apscheduler logs, but the other logs (bug_demo.log) show that the number of schedules is decreasing over time.

agronholm commented 8 months ago

Could you try with v4.0.0a4?

doluk commented 8 months ago

I upgraded apscheduler, deleted the old tables in the database and tried to use the same code as before, but getting now this error:


2023-11-13 08:17:24,598:ERROR-bug_demo:hon3.10/site-packages/apscheduler/_schedulers/async_.py", line 737, in _process_schedules
async_ - _process_schedules - 739:
Error computing next fire time for schedule 'check_self' of task '__main__:check_self' – removing schedulen next
    self._last_fire_time += self._interval
Traceback (most recent call last):r (not "datetime.timedelta") to str
20File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 737, in _process_schedules
bug_fire_time = calculate_next()
20File "/root/.venv/ccn_tracker/lib/python3.10/site-packages/apscheduler/triggers/interval.py", line 67, in next
jobsself._last_fire_time += self._intervalules: 0, regular_check jobs: 0
TypeError: can only concatenate str (not "datetime.timedelta") to str

sorry my phone messes with the formating in the terminal, i can get you a properly formatted one later

agronholm commented 8 months ago

That looks like there are strings leaking to where only datetimes should go. Can you provide a simpler way to reproduce the problem that you're seeing?

doluk commented 8 months ago

The error is from the Interval trigger, just create a schedule with an interval trigger. It executes ones and then creates this error

agronholm commented 8 months ago

It works with the memory store, but with the sqlalchemy store I'm getting a bizarre (but different) error about the function reference which I'm investigating.

agronholm commented 8 months ago

That was a false alarm, it was caused by a leftover schedule from another run. When I cleared the database, it ran perfectly. Is the PostgreSQL standalone example not working for you?

doluk commented 8 months ago

I created a whole new database and run the standalone example, which works fine for me! I then cleared the whole database and used my code again. Still getting the same error. So I started slowly changing the code of the postgresql standalone example towards my code. And with the addition of the JSON serializer it breaks and starts throwing this error TypeError: can only concatenate str (not "datetime.timedelta") to str. I looked a bit into the code and found that if I change interval.py:45 to _last_fire_time: datetime | None = attrs.field(init=False, eq=False, default=None, converter=as_aware_datetime) it works without the previously mentioned TypeError. I tested with this change the Cbor and Pickle serializer as well, which work fine for both.

agronholm commented 8 months ago

Ok, I missed your use of JSONSerializer there. I've actually just started a discussion on this, and would like to know your reasons for a serializer other than pickle.

doluk commented 8 months ago

Replied there as well. For me it was more debugging, to understand what arguments are going in. I checked my initial report by the way with the pickle serializer and the problem still persists.

doluk commented 8 months ago

And I think I found the problem causing this bug of the disappearing schedules. My example code to reproduce it creates 2000 times the same schedules and numerating them. It first works find but then stops working after around 20 minutes. Every schedule schedule another schedule with the same id regular_check-{number} during the execution. I enabled the logging of apscheduler and counted the invokes and the job completed. Those were always 2000 for every round of schedules. To further understand what is happening I modifed the code of the schedule regular_check. I try to get the old_schedule, which is currently excuted and the new one after it got scheduled. To distinguish them, I added a counter to the id, counting the executions. Now there are not longer any schedules disappearing (meaning the count of schedules is 2000 regular_check plus 1 the self_check). Curious by that I looked into the logs. It appears that sometimes during the execution the schedule is still present and receivable with scheduler.get_schedule() and sometimes not. I suspect that the newly scheduled schedule is somehow getting deleted at the end of the job execution, because they have the same id.

async def regular_check(check_id: str, scheduler: Optional[ExtendedAsyncScheduler] = None):
    if not scheduler:
        try:
            scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
        except LookupError as e:
            raise e
    check_id = check_id.replace("regular_check-", "")
    [check_no, iteration] = check_id.split("+")
    iteration = int(iteration)
    try:
        old_schedule = await scheduler.get_schedule(f"regular_check-{check_id}")
    except ScheduleLookupError:
        old_schedule = None
    try:
        run_date = datetime.now(tz=timezone.utc) + timedelta(minutes=5)
        await scheduler.add_schedule(regular_check,
                                     DateTrigger(run_time=run_date),
                                     id=f"regular_check-{check_no}+{iteration + 1}",
                                     misfire_grace_time=300,
                                     conflict_policy=apscheduler.ConflictPolicy.exception,
                                     args=(f"{check_no}+{iteration + 1}",),
                                     max_jitter=30)
        try:
            new_schedule = await scheduler.get_schedule(f"regular_check-{check_no}+{iteration + 1}")
        except ScheduleLookupError:
            new_schedule = None
        scheduler.logger.info(f"Old schedule: {old_schedule=}\nNew schedule: {new_schedule=}" +
                              f"\nAdded regular check {check_id} {iteration+1} at {run_date}")
    except Exception as e:
        scheduler.logger.error(f"Error in regular check {check_id}\n{traceback.format_exc()}")

async def create_schedules(scheduler: Optional[ExtendedAsyncScheduler] = None):
    if not scheduler:
        try:
            scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
        except LookupError as e:
            raise e
    db: asyncpg.Pool = scheduler.db
    try:
        async with db.acquire() as conn:
            for i in range(2000):
                x = str(i)
                while len(x) < 4:
                    x = "0" + x
                try:
                    schedule = await conn.fetchrow("SELECT * from jsonserializer.schedules where id ilike $1",
                                                   f"regular_check-{x}%")
                except Exception as e:
                    traceback.print_exc()
                    raise e
                if not schedule:
                    await scheduler.add_job(regular_check, args=(f'{x}+0',))
        scheduler.logger.error("Finished creating schedules")
    except Exception as e:
        scheduler.logger.error(f"Error in create_schedules\n{traceback.format_exc()}")
agronholm commented 8 months ago

Schedules are always deleted when their triggers don't produce any more fire times, that is normal. Is that what you mean?

doluk commented 8 months ago

Schedule 1 with a datetrigger is executed, during the execution a new schedule 2 with the same id is created. Sometimes during the execution of the job of schedule 1, the schedule 1 is still accessible by the id, sometimes not. If it is present, somehow the schedule 2 disappears after the job of schedule 1 finished.

By using real unique ids this can be avoided. Sadly the part with the KeyError still happens

agronholm commented 8 months ago

Sometimes during the execution of the job of schedule 1, the schedule 1 is still accessible by the id, sometimes not

Is your expectation that the schedule is available as long as there are any jobs active associated with it?

doluk commented 8 months ago

As long as it is consistent I am okay with either one. I would prefer it not being available unless it produces another runtime. Because the schedule exhausted and therefore the schedule id should be free to use again.

agronholm commented 8 months ago

That sounds reasonable. The reason why it works the way it does now is that _process_schedules() creates the jobs, and only after creating the necessary jobs does it remove any finished schedules.

doluk commented 8 months ago

This out of the way, the initial reported error still occurs 🙈 During that I also stumbled across another thing: How does apscheduler handle many pending jobs in combination with jobs from schedules? When I add a lot of jobs before the scheduler is started and have a schedule with an interval trigger, I noticed that after starting up the interval schedule/jobs are not executed in that interval. My expectation would be here, that jobs from schedules have priority compared to jobs which are not connected to a schedule

agronholm commented 8 months ago

This out of the way, the initial reported error still occurs

Can you clarify what you mean?

My expectation would be here, that jobs from schedules have priority compared to jobs which are not connected to a schedule

The schedulers fetch jobs in the FIFO order, by created_at. There is no other prioritization occurring.

doluk commented 8 months ago

the initial error was that a KeyError removing job.id from self._running_jobs

Falydoor commented 5 months ago

the initial error was that a KeyError removing job.id from self._running_jobs

I "fixed" that by doing self._running_jobs.discard(job.id) instead so it doesn't fail if the key is missing. Not a real fix but at least it's not crashing anymore.

agronholm commented 2 months ago

Can you reproduce this with the latest master? I just ran that originally posted script for half an hour without seeing any errors. The only console output I got was starting scheduler, and then some logging present in bug_demo.log, nothing else.

doluk commented 2 months ago

Yes, will have it running over the next few hours

doluk commented 2 months ago

I get no error messages, but the number of jobs is still decaying. After 15 minutes the number of regular_checks droped from 2000 to 911.

agronholm commented 2 months ago

So let me just make sure we're on the same page here. Why are you replacing the schedule from a job belonging to that schedule?

doluk commented 2 months ago

I have an api need to call for a list of identifier over and over again. The time of the next call is dependent on the api response for the identifier. The function making the api calls is regular_check in the example above. Currently the schedule id is regular_check_{identifier}. and the check_id of the example is also the identifier.

With APScheduler v3 I used and sqlite jobstore and never had any issues with this way of doing it. I can see that using the same id again causes issues, but I would argue that the caused behavior is not intuitive (especially considering the used conflictpolicy)

agronholm commented 2 months ago

If you're using DateTrigger, then would it not be possible to create a totally new schedule from the job?

doluk commented 2 months ago

Isn't that what I am doing in regular_checks?

agronholm commented 2 months ago

Ok, I think I didn't read everything thoroughly, so the later snippet is your solution/workaround for the original problem, correct? That would indeed work. The scheduler deletes a schedule when its trigger can no longer generate any new fire times, and this may happen while the job that was spawned from said schedule is still running. I'm wondering if perhaps it shouldn't be? Perhaps instead finished schedules should be cleaned up by the data store cleanup procedure instead which ensures that only schedules with no running jobs are deleted.

doluk commented 2 months ago

Not really solving, because the schedules still get somehow lost. It seems to be related with the number of schedules/jobs. If you keep them below 800 or so they stay stable for hours. So I assume the issue is caused by some kind of bad timing between cleanups, new schedules being stored and the duration of a job.

While I agree it is somewhat silly and fixable at my side, this behavior is not predictable and shouldn't happen.

Beside the solution you suggested, I can think of another one. The check if the Trigger is not producing another schedule should be rechecked/the trigger should be compared before removal of the schedule, to prevent exactly this

doluk commented 2 months ago

And yes the later snippet with the added incremental counter fixes it because then the change in the trigger is also a change in the schedule

agronholm commented 2 months ago

I forgot that this issue should already be fixed in v4.0.0a5. Please let me know either way when you test it.

doluk commented 2 months ago

Still occuring

Soulbadguy54 commented 2 months ago

Hello, get same crashes at random time. Its the scheduler part:

        async with GLOBAL_VARS.SCHEDULER as scheduler:
            await scheduler.add_schedule(DataUpdater.update_faceit_users, IntervalTrigger(days=1), misfire_grace_time=3600, id='faceit_users_parse', max_running_jobs=1)
            await scheduler.add_schedule(DataUpdater.update_lol_matches, IntervalTrigger(minutes=1), misfire_grace_time=15, id='lol_matches_parse', max_running_jobs=1)
            await scheduler.add_schedule(DataUpdater.check_not_announced_matches, IntervalTrigger(seconds=30), misfire_grace_time=15, id='check_not_announced_matches', max_running_jobs=1)
            await scheduler.add_schedule(DataUpdater.update_faceit_matches, IntervalTrigger(minutes=5), misfire_grace_time=180, id='faceit_matches_parse', max_running_jobs=1)
            await scheduler.add_schedule(DISCORD.check_connect, IntervalTrigger(minutes=10), misfire_grace_time=30, id='ds_hearthbeat', max_running_jobs=1)

            logging.info('Program has successfully started [DONE]')

            await scheduler.start_in_background()

            while True:
                await asyncio.sleep(3600)

There is the only addition interaction with shcheduler in code: await GLOBAL_VARS.SCHEDULER.add_job(DataUpdater.update_faceit_match_by_id, kwargs={"match_id": match_id, "player_ids": list(player_ids)})

And sometimes scheduler just crashes with long traceback, but the final part is the same:

self._running_jobs.remove(job.id)
          | KeyError: UUID(<HERE RANDOM ID>)
[2024-05-19 06:43:44,559] - async_.py - Job f3805d3f-db30-4383-a695-b0d56cf46d02 completed successfully
[2024-05-19 06:43:44,561] - async_.py - Job fc5999df-a342-4a1e-a323-a5084efa52e7 was cancelled
[2024-05-19 06:43:44,563] - async_.py - Scheduler crashed
  + Exception Group Traceback (most recent call last):
  |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 784, in run_until_stopped
  |     async with create_task_group() as task_group:
  |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1018, in _process_jobs
    |     await wakeup_event.wait()
    |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1651, in wait
    |     await self._event.wait()
    |   File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
    |     await fut
    | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f28f815b610
    | 
    | During handling of the above exception, another exception occurred:
    | 
    | Exception Group Traceback (most recent call last):
    |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 988, in _process_jobs
    |     async with AsyncExitStack() as exit_stack:
    |   File "/usr/lib/python3.10/contextlib.py", line 714, in __aexit__
    |     raise exc_details[1]
    |   File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
    |     cb_suppress = await cb(*exc_details)
    |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
    |     raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
      |     await self.data_store.release_job(self.identity, job, result)
      |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/datastores/memory.py", line 329, in release_job
      |     job_state = self._jobs_by_id.pop(result.job_id)
      | KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
      | 
      | During handling of the above exception, another exception occurred:
      | 
      | Traceback (most recent call last):
      |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
      +------------------------------------
[2024-05-19 06:43:44,573] - main.py -   + Exception Group Traceback (most recent call last):
  |   File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
  |     cb_suppress = await cb(*exc_details)
  |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Exception Group Traceback (most recent call last):
    |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 784, in run_until_stopped
    |     async with create_task_group() as task_group:
    |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
    |     raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | Exception Group Traceback (most recent call last):
      |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 988, in _process_jobs
      |     async with AsyncExitStack() as exit_stack:
      |   File "/usr/lib/python3.10/contextlib.py", line 714, in __aexit__
      |     raise exc_details[1]
      |   File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
      |     cb_suppress = await cb(*exc_details)
      |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
      |     raise BaseExceptionGroup(
      | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
      +-+---------------- 1 ----------------
        | Traceback (most recent call last):
        |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
        |     self._running_jobs.remove(job.id)
        | KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
        +------------------------------------

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/root/tusa_daunov/main.py", line 55, in main
  |     async with GLOBAL_VARS.SCHEDULER as scheduler:
  |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 178, in __aexit__
  |     await self._exit_stack.__aexit__(exc_type, exc_val, exc_tb)
  |   File "/usr/lib/python3.10/contextlib.py", line 714, in __aexit__
  |     raise exc_details[1]
  |   File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
  |     cb_suppress = await cb(*exc_details)
  |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/root/tusa_daunov/main.py", line 75, in main
    |     await asyncio.sleep(3600)
    |   File "/usr/lib/python3.10/asyncio/tasks.py", line 605, in sleep
    |     return await future
    | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f28f92ef670
    | 
    | During handling of the above exception, another exception occurred:
    | 
    | Exception Group Traceback (most recent call last):
    |   File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
    |     cb_suppress = await cb(*exc_details)
    |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
    |     raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | Exception Group Traceback (most recent call last):
      |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 784, in run_until_stopped
      |     async with create_task_group() as task_group:
      |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
      |     raise BaseExceptionGroup(
      | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
      +-+---------------- 1 ----------------
        | Traceback (most recent call last):
        |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1018, in _process_jobs
        |     await wakeup_event.wait()
        |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1651, in wait
        |     await self._event.wait()
        |   File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
        |     await fut
        | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f28f815b610
        | 
        | During handling of the above exception, another exception occurred:
        | 
        | Exception Group Traceback (most recent call last):
        |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 988, in _process_jobs
        |     async with AsyncExitStack() as exit_stack:
        |   File "/usr/lib/python3.10/contextlib.py", line 714, in __aexit__
        |     raise exc_details[1]
        |   File "/usr/lib/python3.10/contextlib.py", line 697, in __aexit__
        |     cb_suppress = await cb(*exc_details)
        |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
        |     raise BaseExceptionGroup(
        | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
        +-+---------------- 1 ----------------
          | Traceback (most recent call last):
          |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
          |     await self.data_store.release_job(self.identity, job, result)
          |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/datastores/memory.py", line 329, in release_job
          |     job_state = self._jobs_by_id.pop(result.job_id)
          | KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
          | 
          | During handling of the above exception, another exception occurred:
          | 
          | Traceback (most recent call last):
          |   File "/root/.cache/pypoetry/virtualenvs/tusa-daunov-VHJQGi9s-py3.10/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
          |     self._running_jobs.remove(job.id)
          | KeyError: UUID('f3805d3f-db30-4383-a695-b0d56cf46d02')
agronholm commented 2 months ago

It would be helpful to have a minimum workable example. All the snippets in this issue are either unnecessarily complex or incomplete.

One thing to consider is the run time of the jobs. If the jobs run for longer than 15 minutes, they could trigger a known bug which is present even in v4.0.0a5.

@doluk When you say it's still happening, do you mean the original issue (with all those exceptions) or something else, namely the number of schedules being wrong?

@Soulbadguy54 Could you produce a minimum workable example I could run to reproduce the issue using just the memory data store?

doluk commented 2 months ago

It is happening that schedules go missing, sorry worded it badly

agronholm commented 2 months ago

So there are two different issues here: schedules going missing, and the KeyError about job IDs. I'm not sure yet that they're related.

Soulbadguy54 commented 2 months ago

Its can be related, since the scheduler tries to delete missing job or something like this ...

agronholm commented 2 months ago

But if the scheduler tries to delete jobs that aren't there, there would be errors like the ones above. I'm not seeing any, however, when I try to reproduce the issue with your script.

doluk commented 2 months ago

So my example runs eith a postgresql database (version 16) without any decaying number of schedules? How many schedules do you let it create? For smaller numbers below 2000, it takes way longer to occur/never does. If I let it create 6000 I can see them drop within 15 minutes

agronholm commented 2 months ago

As I recall, I saw a decaying number of schedules the last time I ran it, but I didn't see any KeyErrors as in your original report.

doluk commented 2 months ago

yes no keyerrors anymore

rodolfoap commented 1 month ago

apscheduler==4.0.0a5 . Same error here:

INFO:apscheduler._schedulers.sync:Job b6e86506-c57c-492b-be92-5a1b95329563 completed successfully
ERROR:apscheduler._schedulers.sync:Scheduler crashed
  + Exception Group Traceback (most recent call last):
  |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 808, in run_until_stopped
  |     await self.event_broker.publish_local(SchedulerStarted())
  |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1018, in _process_jobs
    |     await wakeup_event.wait()
    |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 1662, in wait
    |     await self._event.wait()
    |   File "/usr/lib/python3.8/asyncio/locks.py", line 309, in wait
    |     await fut
    | asyncio.exceptions.CancelledError
    |
    | During handling of the above exception, another exception occurred:
    |
    | Exception Group Traceback (most recent call last):
    |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1019, in _process_jobs
    |     wakeup_event = anyio.Event()
    |   File "/usr/lib/python3.8/contextlib.py", line 679, in __aexit__
    |     raise exc_details[1]
    |   File "/usr/lib/python3.8/contextlib.py", line 662, in __aexit__
    |     cb_suppress = await cb(*exc_details)
    |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
    |     raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (2 sub-exceptions)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
      |     await self.data_store.release_job(self.identity, job, result)
      |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/datastores/memory.py", line 329, in release_job
      |     job_state = self._jobs_by_id.pop(result.job_id)
      | KeyError: UUID('08d6d19f-47b8-4ef6-aae3-efbb3bc90234')
      |
      | During handling of the above exception, another exception occurred:
      |
      | Traceback (most recent call last):
      |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('08d6d19f-47b8-4ef6-aae3-efbb3bc90234')
      +---------------- 2 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
      |     await self.data_store.release_job(self.identity, job, result)
      |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/datastores/memory.py", line 329, in release_job
      |     job_state = self._jobs_by_id.pop(result.job_id)
      | KeyError: UUID('b6e86506-c57c-492b-be92-5a1b95329563')
      |
      | During handling of the above exception, another exception occurred:
      |
      | Traceback (most recent call last):
      |   File "/usr/local/share/monitor/venv/lib/python3.8/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('b6e86506-c57c-492b-be92-5a1b95329563')
      +------------------------------------

My scheduler is simple:

def get_trigger(entry):
        return CronTrigger(minute=entry["minute"], hour=entry["hour"], day=entry["day"],
                month=entry["month"], day_of_week=entry["day_of_week"],)

scheduler = Scheduler()
for plugin in load_plugins("./plugins"):
        ...
        scheduler.add_schedule(plugin_runner_function, get_trigger(plugin.schedule), args=[plugin.args])
HK-Mattew commented 1 month ago

I get the same KeyError error.

If it's important to know, I'm using: apscheduler 4.0.0a5, Mongodb datastore, python3.9.

I see that there are already several other code examples above where the same error is occurring. Because of this, I will not post example code in this post.

Error Log:

Scheduler crashed
  + Exception Group Traceback (most recent call last):
  |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 808, in run_until_stopped
  |     await self.event_broker.publish_local(SchedulerStarted())
  |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1009, in _process_jobs
    |     jobs = await self.data_store.acquire_jobs(self.identity, limit)
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 542, in acquire_jobs
    |     async for attempt in self._retry():
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
    |     do = await self.iter(retry_state=self._retry_state)
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    |     result = await action(retry_state)
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/_utils.py", line 99, in inner
    |     return call(*args, **kwargs)
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/__init__.py", line 392, in <lambda>
    |     self._add_action_func(lambda rs: rs.outcome.result())
    |   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    |     return self.__get_result()
    |   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    |     raise self._exception
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 545, in acquire_jobs
    |     async with await AsyncCursor.create(
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 126, in create
    |     cursor = await to_thread.run_sync(func)
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2128, in run_sync_in_worker_thread
    |     await cls.checkpoint()
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2050, in checkpoint
    |     await sleep(0)
    |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 641, in sleep
    |     await __sleep0()
    |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 630, in __sleep0
    |     yield
    | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ffa443cb880
    | 
    | During handling of the above exception, another exception occurred:
    | 
    | Exception Group Traceback (most recent call last):
    |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
    |     result = coro.send(None)
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1019, in _process_jobs
    |     wakeup_event = anyio.Event()
    |   File "/usr/local/lib/python3.9/contextlib.py", line 670, in __aexit__
    |     raise exc_details[1]
    |   File "/usr/local/lib/python3.9/contextlib.py", line 653, in __aexit__
    |     cb_suppress = await cb(*exc_details)
    |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
    |     raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (4 sub-exceptions)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
      |     result = coro.send(None)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('eef5e05b-03b9-4ec5-9f1a-747847a4bdde')
      +---------------- 2 ----------------
      | Traceback (most recent call last):
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
      |     await self.data_store.release_job(self.identity, job, result)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 630, in release_job
      |     async for attempt in self._retry():
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
      |     do = await self.iter(retry_state=self._retry_state)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
      |     result = await action(retry_state)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/_utils.py", line 99, in inner
      |     return call(*args, **kwargs)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/__init__.py", line 392, in <lambda>
      |     self._add_action_func(lambda rs: rs.outcome.result())
      |   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
      |     return self.__get_result()
      |   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
      |     raise self._exception
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 640, in release_job
      |     await to_thread.run_sync(
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
      |     return await get_async_backend().run_sync_in_worker_thread(
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2128, in run_sync_in_worker_thread
      |     await cls.checkpoint()
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2050, in checkpoint
      |     await sleep(0)
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 641, in sleep
      |     await __sleep0()
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 630, in __sleep0
      |     yield
      | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ffa443cb880
      | 
      | During handling of the above exception, another exception occurred:
      | 
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 258, in __step
      |     result = coro.throw(exc)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('eef5e05b-03b9-4ec5-9f1a-747847a4bdde')
      +---------------- 3 ----------------
      | Traceback (most recent call last):
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
      |     await self.data_store.release_job(self.identity, job, result)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 630, in release_job
      |     async for attempt in self._retry():
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
      |     do = await self.iter(retry_state=self._retry_state)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
      |     result = await action(retry_state)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/_utils.py", line 99, in inner
      |     return call(*args, **kwargs)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/__init__.py", line 392, in <lambda>
      |     self._add_action_func(lambda rs: rs.outcome.result())
      |   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
      |     return self.__get_result()
      |   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
      |     raise self._exception
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 640, in release_job
      |     await to_thread.run_sync(
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
      |     return await get_async_backend().run_sync_in_worker_thread(
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2128, in run_sync_in_worker_thread
      |     await cls.checkpoint()
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2050, in checkpoint
      |     await sleep(0)
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 641, in sleep
      |     await __sleep0()
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 630, in __sleep0
      |     yield
      | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ffa443cb880
      | 
      | During handling of the above exception, another exception occurred:
      | 
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 258, in __step
      |     result = coro.throw(exc)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('658dd562-7268-428f-84a4-d9387f3d31db')
      +---------------- 4 ----------------
      | Traceback (most recent call last):
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1072, in _run_job
      |     await self.data_store.release_job(self.identity, job, result)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 630, in release_job
      |     async for attempt in self._retry():
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
      |     do = await self.iter(retry_state=self._retry_state)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
      |     result = await action(retry_state)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/_utils.py", line 99, in inner
      |     return call(*args, **kwargs)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/tenacity/__init__.py", line 392, in <lambda>
      |     self._add_action_func(lambda rs: rs.outcome.result())
      |   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
      |     return self.__get_result()
      |   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
      |     raise self._exception
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/datastores/mongodb.py", line 640, in release_job
      |     await to_thread.run_sync(
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
      |     return await get_async_backend().run_sync_in_worker_thread(
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2128, in run_sync_in_worker_thread
      |     await cls.checkpoint()
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2050, in checkpoint
      |     await sleep(0)
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 641, in sleep
      |     await __sleep0()
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 630, in __sleep0
      |     yield
      | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ffa443cb880
      | 
      | During handling of the above exception, another exception occurred:
      | 
      | Traceback (most recent call last):
      |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 258, in __step
      |     result = coro.throw(exc)
      |   File "/home/username/myproject/env/lib/python3.9/site-packages/apscheduler/_schedulers/async_.py", line 1076, in _run_job
      |     self._running_jobs.remove(job.id)
      | KeyError: UUID('658dd562-7268-428f-84a4-d9387f3d31db')
      +------------------------------------
agronholm commented 2 days ago

I believe this has been fixed in master now. Let me know if it happens again.