Feature Request: SQLClient.create_all check for existing table first with CREATE TABLE IF NOT EXISTS

circleous commented 2 years ago

When using a wsgi worker (e.g. gunicorn), all worker started almost immediately. This create a somewhat racy condition where library thinks that the table is not exists but it's actually in the process of creating. Code example,

from flask import Flask

# FlaskSQLService https://github.com/dgilland/sqlservice/issues/3#issuecomment-272173614
from models import db 

app = Flask(__name__)
app.config["SQL_DATABASE_URI"] = "sqlite:///db.sqlite3"
app.config["SQL_ECHO"] = True # debug sql statements

db.init_app(server)
with server.app_context():
    db.create_all()

Example logs:

app_1     | [2022-01-11 08:00:33 +0000] [1] [INFO] Starting gunicorn 20.1.0
app_1     | [2022-01-11 08:00:33 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
app_1     | [2022-01-11 08:00:33 +0000] [1] [INFO] Using worker: sync
app_1     | [2022-01-11 08:00:33 +0000] [8] [INFO] Booting worker with pid: 8
app_1     | [2022-01-11 08:00:33 +0000] [9] [INFO] Booting worker with pid: 9
app_1     | [2022-01-11 08:00:33 +0000] [10] [INFO] Booting worker with pid: 10
app_1     | [2022-01-11 08:00:33 +0000] [11] [INFO] Booting worker with pid: 11
app_1     | 2022-01-11 08:00:34,050 INFO sqlalchemy.engine.Engine BEGIN (implicit)
app_1     | 2022-01-11 08:00:34,050 INFO sqlalchemy.engine.Engine BEGIN (implicit)
app_1     | 2022-01-11 08:00:34,050 INFO sqlalchemy.engine.Engine BEGIN (implicit)
app_1     | 2022-01-11 08:00:34,050 INFO sqlalchemy.engine.Engine BEGIN (implicit)
app_1     | 2022-01-11 08:00:34,050 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("test")
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine [raw sql] ()
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("test")
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine [raw sql] ()
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("test")
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine [raw sql] ()
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("test")
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("test")
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine [raw sql] ()
app_1     | 2022-01-11 08:00:34,051 INFO sqlalchemy.engine.Engine [raw sql] ()
app_1     | 2022-01-11 08:00:34,052 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("test")
app_1     | 2022-01-11 08:00:34,052 INFO sqlalchemy.engine.Engine [raw sql] ()
app_1     | 2022-01-11 08:00:34,052 INFO sqlalchemy.engine.Engine
app_1     | CREATE TABLE test (
app_1     |     test_id VARCHAR(36) NOT NULL
app_1     | )
app_1     |
app_1     |
app_1     | 2022-01-11 08:00:34,052 INFO sqlalchemy.engine.Engine [no key 0.00019s] ()
app_1     | 2022-01-11 08:00:34,053 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("test")
app_1     | 2022-01-11 08:00:34,053 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("test")
app_1     | 2022-01-11 08:00:34,053 INFO sqlalchemy.engine.Engine [raw sql] ()
app_1     | 2022-01-11 08:00:34,053 INFO sqlalchemy.engine.Engine [raw sql] ()
app_1     | 2022-01-11 08:00:34,054 INFO sqlalchemy.engine.Engine
app_1     | CREATE TABLE test (
app_1     |     test_id VARCHAR(36) NOT NULL
app_1     | )
app_1     |
app_1     |
app_1     | 2022-01-11 08:00:34,054 INFO sqlalchemy.engine.Engine
app_1     | CREATE TABLE test (
app_1     |     test_id VARCHAR(36) NOT NULL
app_1     | )
app_1     |
app_1     |
app_1     | 2022-01-11 08:00:34,055 INFO sqlalchemy.engine.Engine [no key 0.00116s] ()
app_1     | 2022-01-11 08:00:34,055 INFO sqlalchemy.engine.Engine [no key 0.00079s] ()
app_1     | 2022-01-11 08:00:34,056 INFO sqlalchemy.engine.Engine
app_1     | CREATE TABLE test (
app_1     |     test_id VARCHAR(36) NOT NULL
app_1     | )
app_1     |
app_1     |
app_1     | 2022-01-11 08:00:34,057 INFO sqlalchemy.engine.Engine [no key 0.00022s] ()
app_1     | 2022-01-11 08:00:34,060 INFO sqlalchemy.engine.Engine COMMIT
app_1     | 2022-01-11 08:00:34,061 INFO sqlalchemy.engine.Engine ROLLBACK
app_1     | 2022-01-11 08:00:34,065 INFO sqlalchemy.engine.Engine ROLLBACK
app_1     | 2022-01-11 08:00:34,065 INFO sqlalchemy.engine.Engine ROLLBACK
app_1     | [2022-01-11 08:00:34 +0000] [8] [ERROR] Exception in worker process
app_1     | Traceback (most recent call last):
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
app_1     |     self.dialect.do_execute(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
app_1     |     cursor.execute(statement, parameters)
app_1     | sqlite3.OperationalError: table test already exists
app_1     |
app_1     | The above exception was the direct cause of the following exception:
app_1     |
app_1     | Traceback (most recent call last):
app_1     |   File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
app_1     |     worker.init_process()
app_1     |   File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
app_1     |     self.load_wsgi()
app_1     |   File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
app_1     |     self.wsgi = self.app.wsgi()
app_1     |   File "/usr/local/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
app_1     |     self.callable = self.load()
app_1     |   File "/usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
app_1     |     return self.load_wsgiapp()
app_1     |   File "/usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
app_1     |     return util.import_app(self.app_uri)
app_1     |   File "/usr/local/lib/python3.10/site-packages/gunicorn/util.py", line 359, in import_app
app_1     |     mod = importlib.import_module(module)
app_1     |   File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
app_1     |     return _bootstrap._gcd_import(name[level:], package, level)
app_1     |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
app_1     |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
app_1     |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
app_1     |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
app_1     |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
app_1     |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
app_1     |   File "/app/__init__.py", line 22, in <module>
app_1     |     db.create_all()
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlservice/client.py", line 417, in create_all
app_1     |     self.get_metadata().create_all(self.engine)
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/schema.py", line 4785, in create_all
app_1     |     bind._run_ddl_visitor(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3117, in _run_ddl_visitor
app_1     |     conn._run_ddl_visitor(visitorcallable, element, **kwargs)
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2113, in _run_ddl_visitor
app_1     |     visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/visitors.py", line 524, in traverse_single
app_1     |     return meth(obj, **kw)
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/ddl.py", line 849, in visit_metadata
app_1     |     self.traverse_single(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/visitors.py", line 524, in traverse_single
app_1     |     return meth(obj, **kw)
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/ddl.py", line 893, in visit_table
app_1     |     self.connection.execute(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1289, in execute
app_1     |     return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/ddl.py", line 80, in _execute_on_connection
app_1     |     return connection._execute_ddl(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1381, in _execute_ddl
app_1     |     ret = self._execute_context(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
app_1     |     self._handle_dbapi_exception(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
app_1     |     util.raise_(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
app_1     |     raise exception
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
app_1     |     self.dialect.do_execute(
app_1     |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
app_1     |     cursor.execute(statement, parameters)
app_1     | sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) table test already exists
app_1     | [SQL:
app_1     | CREATE TABLE test (
app_1     |     test_id VARCHAR(36) NOT NULL
app_1     | )
app_1     |
app_1     | ]
app_1     | (Background on this error at: https://sqlalche.me/e/14/e3q8)

The usual log output should be something like this,

2022-01-11 15:01:46,907 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-01-11 15:01:46,907 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("test")
2022-01-11 15:01:46,907 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-01-11 15:01:46,908 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("test")
2022-01-11 15:01:46,908 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-01-11 15:01:46,909 INFO sqlalchemy.engine.Engine
CREATE TABLE test (
        test_id VARCHAR(36) NOT NULL
)

2022-01-11 15:01:46,909 INFO sqlalchemy.engine.Engine [no key 0.00010s] ()
2022-01-11 15:01:46,910 INFO sqlalchemy.engine.Engine COMMIT

I don't really know how sqlalchemy works, but I think the log output of PRAGMA main.table_info("test") is when it checks for existing table and this cause a racy condition in wsgi worker case, since all of the worker checks for existing table immediately and all of the worker thinks that the table isn't exists, then all of the worker proceed to create the table which can cause this table already exists error.

circleous commented 2 years ago

Some fixes that comes to mind is by using CREATE TABLE IF NOT EXISTS, I think this is what SQLAlchemy use on their checkfirst param in create_all function.

circleous commented 2 years ago

Update: I just found that gunicorn does have server starting hooks that can run exactly once in master process, before child worker is spawned. https://docs.gunicorn.org/en/latest/settings.html#server-hooks

# gunicorn_config.py
from sqlservice import SQLClient

from app.model import Model
from app.config import DATABASE_URI

def on_starting(server):
    dbConfig = {"SQL_DATABASE_URI": DATABASE_URI}
    db = SQLClient(dbConfig, model_class=Model)
    db.create_all()

then start gunicorn with these params

gunicorn [...] -w $NUM_WORKER -c gunicorn_config.py

dgilland commented 2 years ago

It looks like sqlalchemy defaults to checkfirst=True in create_all: https://docs.sqlalchemy.org/en/14/core/metadata.html?highlight=checkfirst#sqlalchemy.schema.MetaData.create_all.params.checkfirst.

But maybe sqlalchemy is doing a separate check before issuing the CREATE TABLE statement which is causing the race condition? Would be curious to see if the SQL emitted is same/different when checkfirst=False vs checkfirst=True.

circleous commented 2 years ago

Created a relatively small test,

from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.orm import declarative_base

engine = create_engine("sqlite:///db.sqlite3", echo=True, future=True)

Base = declarative_base()

class Test(Base):
    __tablename__ = "test"

    id = Column(Integer, primary_key=True)

import logging
logging.basicConfig(format="%(threadName)s %(asctime)s %(levelname)s %(message)s")

import threading

def worker():
    Base.metadata.create_all(engine, checkfirst=True)

threads = []
for _ in range(4):
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()
for t in threads:
    t.join()

and yes, It turns out checkfirst=True doesn't actually use CREATE TABLE IF NOT EXISTS, but it does query the table metadata from information schema. checkfirst=False just skip the table metadata entirely. So, I guess the work around by running the create_all function in master process to ensure it only runs once not in parallel jobs/worker is the correct way to do it. Also, it does seems to be a correct design choice, I mean, rather than run create table in all worker, create table should only run once.

circleous commented 1 year ago

Closing this since database migration shouldn’t be handled by app initiation anyway.

dgilland / sqlservice

Feature Request: SQLClient.create_all check for existing table first with CREATE TABLE IF NOT EXISTS #34