agronholm / apscheduler

Task scheduling library for Python
MIT License
6.18k stars 705 forks source link

SQLAlchemyJobStore with AsyncIOScheduler and FastAPI gives OperationalError #499

Open GabrieleCalarota opened 3 years ago

GabrieleCalarota commented 3 years ago

Describe the bug When reloading the fastapi process on Heroku, I get an OperationalError described as following:

2021-03-06 12:30:29,625 ERROR red_circle
From fastapi:load_schedule_or_create_blank
(sqlite3.OperationalError) table apscheduler_jobs already exists
[SQL:
CREATE TABLE apscheduler_jobs (
id VARCHAR(191) NOT NULL,
next_run_time FLOAT,
job_state BLOB NOT NULL,
PRIMARY KEY (id)
)

]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

To Reproduce Please provide a minimal reproducible example that developers can run to investigate the problem. You can find help for creating such an example here. I'm loading the scheduler on a fastapi web service:

@router.on_event("startup")
async def load_schedule_or_create_blank():
    """

    Initialize the Schedule Object as a Global Param and also load existing Schedules from SQLite
    This allows for persistent schedules across server restarts.
    :return:
    """
    global Scheduler
    try:
        jobstores = {
            'default': SQLAlchemyJobStore(url='sqlite:///aps_scheduler_jobs.sqlite')
        }
        Scheduler = AsyncIOScheduler(jobstores=jobstores)
        Scheduler.start()
        Scheduler.add_listener(my_listener, EVENT_JOB_ERROR)
        logger.info("Created Schedule Object")
    except Exception as e:
        logger.exception(str(e))
        logger.error("Unable to Create Schedule Object")

Expected behavior It shoudln't give this error. This also results in not adding jobs or misfiring them.

Additional context Can this be connected to the async function declared?

agronholm commented 3 years ago

Does this only happen on Heroku, or could I reproduce it on my own?

GabrieleCalarota commented 3 years ago

@agronholm I've noticed it running on heroku (I'll share the procfile), but I guess this happens anywhere.

web:uvicorn app.main:fastapi_app --host=0.0.0.0 --port=${PORT:-5000} --log-level info --use-colors

I don't know though if it's related to fastapi or SqlAlchemyJobStore with AsyncIOScheduler

agronholm commented 3 years ago

Well, I need to be able to reproduce it locally if you want me to investigate.

GabrieleCalarota commented 3 years ago

@agronholm Locally I've never experienced such error, but I clearly don't understand the nature of it and why does it happen on heroku (also on a production dyno)

agronholm commented 3 years ago

Ok, but with all due respect, how am I supposed to debug this without the ability to reproduce it?

daddycocoaman commented 3 years ago

Hi. I'm using APScheduler with FastAPI. I don't really know how Heroku works, but I can tell you that even in development with FastAPI, the startup event is not a great place for initializing APScheduler, even with a single worker. I'm betting it's even worse with multiple workers.

You would be better off initializing APScheduler before the FastAPI/uvicorn/gunicorn context (maybe in the init.py) and treating that globally.

daddycocoaman commented 3 years ago

Sorry...my previous comment won't solve the issue but it's something I noticed.

@agronholm I think the issue is that the SqlAlchemyJobStore doesn't check for the existing table before trying to create it.

    def start(self, scheduler, alias):
        super(SQLAlchemyJobStore, self).start(scheduler, alias)
        self.jobs_t.create(self.engine, True)

There is a checkfirst parameter for .create that is default for False, but is set to True by default for things like create_all. https://docs.sqlalchemy.org/en/14/core/metadata.html#sqlalchemy.schema.Table.create

agronholm commented 3 years ago

There is a checkfirst parameter for .create that is default for False, but is set to True by default for things like create_all.

And what value do you see the code you pasted passing to Table.create()?

daddycocoaman commented 3 years ago

Whomp whomp. Reading is fundamental.

At any rate, the issue is related to this, and looks like I might have been right with the first suggestion. Moving the APScheduler initialization outside of the FastAPI context (specifically where there's multiple workers), should help.

Even though multiple workers weren't specified in the uvicorn command listed, it looks like Heroku adds additional workers by default. . Uvicorn also respects WEB_CONCURRENCY, so it's causing multiple APScheduler instances, and in turn causing that sqlite issue.

agronholm commented 3 years ago

Are you running the scheduler in web workers? That will never work. Heroku has specific instructions for this.

GabrieleCalarota commented 3 years ago

Thanks @daddycocoaman for the hints. I've tried to put stuff before in the main.py where the fastapi app is loaded, but somehow it gets run multiple times, whereas it doesn't seem to happen in the onstartup event (As pointed out on fastapi doc, it seems like a good place where to initialize stuff https://fastapi.tiangolo.com/advanced/events/ )

The usage of apscheduler initialized in the on startup event, was also pointed out in this article

I'll try to move the initialization back in the main.py

GabrieleCalarota commented 3 years ago

@agronholm Regarding the specific instructions for scheduled process, do you think I should split the web process and the scheduled process in two? The jobs are added necessarily by the web process because they are set when a certain user calls certain apis

daddycocoaman commented 3 years ago

The startup event is typically the right place in FastAPI for initializing schedulers, databases or whatever. But that article does not account for the nuance of hosting with multiple workers, which is the source of your problem.

Also, should point out that gunicorn is meant for prod environments as opposed to uvicorn, and you might have some luck using the --preload switch from that.

GabrieleCalarota commented 3 years ago

I've partially solved the issue although I still see it! Because of cycling sqllite as a job store wasn't appropriate (it got canceled ever 24h), so I changed it to a postgresql database.

I've moved the initialization outside onstartup event. For some reason I couldn't make it work with gunicorn, so I'm still running it with uvicorn even in production.

With only 1 web concurrency process, I now not see anymore the issue, but this morning i got this:

2021-03-27 06:57:09,587 ERROR 🔴
From fastapi:add_delete_cart_job
(psycopg2.errors.AdminShutdown) terminating connection due to administrator command
SSL connection has been closed unexpectedly

[SQL: INSERT INTO apscheduler_jobs (id, next_run_time, job_state) VALUES (%(id)s, %(next_run_time)s, %(job_state)s)]
[parameters: {'id': 'pending_3293', 'next_run_time': 1616829129.572856, 'job_state': <psycopg2.extensions.Binary object at 0x7f6a08cc1630>}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

Is this sqlalchemy jobstore somewhat reliable to run in production???

agronholm commented 3 years ago

In apscheduler 3.x, the scheduler doesn't really deal with unexpected job store errors. In v4.0 (current master), I will try to make it a bit more resilient against errors and have it try again, or at least crash the scheduler if that happens.