agronholm / apscheduler

Task scheduling library for Python
MIT License
5.98k stars 694 forks source link

Task deserialization error with SQLAlchemyDatastore #815

Closed nuno-andre closed 8 months ago

nuno-andre commented 8 months ago

Things to check first

Version

https://github.com/agronholm/apscheduler/commit/a67a82c821d5eeb882192d453dea215f7e8987bf

What happened?

When the scheduler tries to configure a task created in a previous run raises a validation error. This doesn't happen when the task is created for the first time.

(I've come across this error both with SQLite and PostgreSQL.)

      | Traceback (most recent call last):
      |   File "/home/nuno/repos/fands/_soc-services/aptest/test2.py", line 26, in main
      |     await scheduler.add_schedule(tick, IntervalTrigger(seconds=1), id="tick")
      |   File "/home/nuno/repos/fands/_soc-services/.venv/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 409, in add_schedule
      |     task = await self.configure_task(
      |   File "/home/nuno/repos/fands/_soc-services/.venv/lib/python3.10/site-packages/apscheduler/_schedulers/async_.py", line 308, in configure_task
      |     task.func = func
      |   File "/home/nuno/repos/fands/_soc-services/.venv/lib/python3.10/site-packages/attr/_make.py", line 1068, in __setattr__
      |     nval = hook(self, a, val)
      |   File "/home/nuno/repos/fands/_soc-services/.venv/lib/python3.10/site-packages/attr/setters.py", line 23, in wrapped_pipe
      |     rv = setter(instance, attrib, rv)
      |   File "/home/nuno/repos/fands/_soc-services/.venv/lib/python3.10/site-packages/attr/setters.py", line 52, in validate
      |     v(instance, attrib, new_value)
      |   File "/home/nuno/repos/fands/_soc-services/.venv/lib/python3.10/site-packages/attr/validators.py", line 269, in __call__
      |     self.validator(inst, attr, value)
      |   File "/home/nuno/repos/fands/_soc-services/.venv/lib/python3.10/site-packages/attr/_make.py", line 2927, in __call__
      |     v(inst, attr, value)
      |   File "/home/nuno/repos/fands/_soc-services/.venv/lib/python3.10/site-packages/attr/validators.py", line 100, in __call__
      |     raise TypeError(
      | TypeError: ("'func' must be <class 'str'> (got <function tick at 0x7f8260363d90> that is a <class 'function'>).", Attribute(name='func', default=NOTHING, validator=<optional validator for _AndValidator(_validators=(<instance_of validator for type <class 'str'>>, <matches_re validator for pattern re.compile('.+:.+')>)) or None>, repr=True, eq=False, eq_key=None, order=False, order_key=None, hash=None, init=True, metadata=mappingproxy({}), type='str | None', converter=None, kw_only=True, inherited=False, on_setattr=None, alias='func'), <class 'str'>, <function tick at 0x7f8260363d90>)

Commenting the following lines makes the code to run successfully.

https://github.com/agronholm/apscheduler/blob/a67a82c821d5eeb882192d453dea215f7e8987bf/src/apscheduler/_schedulers/async_.py#L307-L309

How can we reproduce the bug?

Run this code (it will create a new DB), stop it, and run it again (with the previously created DB).

import asyncio
from datetime import datetime

from sqlalchemy.ext.asyncio import create_async_engine
from apscheduler import AsyncScheduler
from apscheduler.datastores.sqlalchemy import SQLAlchemyDataStore
from apscheduler.triggers.interval import IntervalTrigger

def tick():
    print("Hello, the time is", datetime.now())

async def main():
    engine = create_async_engine('sqlite+aiosqlite:///test.sqlite')
    data_store = SQLAlchemyDataStore(engine)
    async with AsyncScheduler(data_store) as scheduler:
        await scheduler.add_schedule(tick, IntervalTrigger(seconds=1), id="tick")
        await scheduler.run_until_stopped()

asyncio.run(main())
agronholm commented 8 months ago

I see another problem too: on the first run, CPU usage goes to 100%. I don't know why, but I'll investigate.

agronholm commented 8 months ago

I found the problem: it was some leftover logic from a previous, unsuccessful attempt at implementing configure_task(). I have to investigate the other problem before I push the fix though.

agronholm commented 8 months ago

When I fix the old logic, it still uses up 100% CPU without doing anything useful. This only happens with aiosqlite and sqlite; it works correctly with PostgreSQL.

agronholm commented 8 months ago

Something is seriously wrong with the sqlite logic:

DEBUG:apscheduler._schedulers.async_:Sleeping -263.660 seconds until the next fire time (2023-11-12 02:00:46.513514+02:00)
DEBUG:apscheduler._schedulers.async_:Sleeping -263.661 seconds until the next fire time (2023-11-12 02:00:46.513514+02:00)
DEBUG:apscheduler._schedulers.async_:Sleeping -263.663 seconds until the next fire time (2023-11-12 02:00:46.513514+02:00)
agronholm commented 8 months ago

It's looking like acquire_schedules() is returning no results. This is puzzling, as there are several tests passing against the sqlite stores and I don't know why they would pass but this won't work. I suspect it has to do with the lack of date/time handling in sqlite.

agronholm commented 8 months ago

Got it: it's due to sqlite doing timestamp comparisons as text. This would've been fine without the time zone offsets, but they make datetime comparison difficult if not impossible on sqlite. The reason this wasn't caught by tests is because the test data used UTC offsets which the data store also uses for queries, so it all worked.

agronholm commented 8 months ago

I solved it by special casing sqlite so that the index is created on the expression unix_epoch(next_fire_time) and the same function is compared against now.timestamp(). I'll push the fix tomorrow once I have accompanying tests in place.

agronholm commented 8 months ago

As soon as I updated the tests to use a positive UTC offset in the timestamps, I discovered that MySQL had a similar issue. My initial fix for that hasn't passed the tests yet.

agronholm commented 8 months ago

Ok...so the solution that works for MySQL only works for MySQL 8.0 onwards, and doesn't work on MariaDB :disappointed: Back to the drawing board I guess...

agronholm commented 8 months ago

After a really ugly workaround, everything seems to be working now.

nuno-andre commented 8 months ago

Maybe using only UTC internally and making the conversion in the input/output interfaces...

I have been bitten by timezone messes too many times... Now I only use them in user interfaces.

BTW, this seems an amazing refactoring. I've pinned it to start using it ASAP.

agronholm commented 8 months ago

Maybe using only UTC internally and making the conversion in the input/output interfaces...

Won't work, as we need the actual time zone information too which would be lost this way.