Closed circleous closed 1 year ago
Some fixes that comes to mind is by using CREATE TABLE IF NOT EXISTS
, I think this is what SQLAlchemy use on their checkfirst
param in create_all
function.
Update: I just found that gunicorn does have server starting hooks that can run exactly once in master process, before child worker is spawned. https://docs.gunicorn.org/en/latest/settings.html#server-hooks
# gunicorn_config.py
from sqlservice import SQLClient
from app.model import Model
from app.config import DATABASE_URI
def on_starting(server):
dbConfig = {"SQL_DATABASE_URI": DATABASE_URI}
db = SQLClient(dbConfig, model_class=Model)
db.create_all()
then start gunicorn with these params
gunicorn [...] -w $NUM_WORKER -c gunicorn_config.py
It looks like sqlalchemy defaults to checkfirst=True
in create_all
: https://docs.sqlalchemy.org/en/14/core/metadata.html?highlight=checkfirst#sqlalchemy.schema.MetaData.create_all.params.checkfirst.
But maybe sqlalchemy is doing a separate check before issuing the CREATE TABLE
statement which is causing the race condition? Would be curious to see if the SQL emitted is same/different when checkfirst=False
vs checkfirst=True
.
Created a relatively small test,
from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.orm import declarative_base
engine = create_engine("sqlite:///db.sqlite3", echo=True, future=True)
Base = declarative_base()
class Test(Base):
__tablename__ = "test"
id = Column(Integer, primary_key=True)
import logging
logging.basicConfig(format="%(threadName)s %(asctime)s %(levelname)s %(message)s")
import threading
def worker():
Base.metadata.create_all(engine, checkfirst=True)
threads = []
for _ in range(4):
t = threading.Thread(target=worker)
threads.append(t)
t.start()
for t in threads:
t.join()
and yes, It turns out checkfirst=True
doesn't actually use CREATE TABLE IF NOT EXISTS
, but it does query the table metadata from information schema. checkfirst=False
just skip the table metadata entirely. So, I guess the work around by running the create_all function in master process to ensure it only runs once not in parallel jobs/worker is the correct way to do it. Also, it does seems to be a correct design choice, I mean, rather than run create table in all worker, create table should only run once.
Closing this since database migration shouldn’t be handled by app initiation anyway.
When using a wsgi worker (e.g. gunicorn), all worker started almost immediately. This create a somewhat racy condition where library thinks that the table is not exists but it's actually in the process of creating. Code example,
Example logs:
The usual log output should be something like this,
I don't really know how sqlalchemy works, but I think the log output of
PRAGMA main.table_info("test")
is when it checks for existing table and this cause a racy condition in wsgi worker case, since all of the worker checks for existing table immediately and all of the worker thinks that the table isn't exists, then all of the worker proceed to create the table which can cause thistable already exists
error.