fastapi / full-stack-fastapi-template

Full stack, modern web application template. Using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS and more.
MIT License
27.25k stars 4.86k forks source link

Db connection management - performances #290

Open meddario opened 4 years ago

meddario commented 4 years ago

Hi guys,

this is based on the discussion here: https://github.com/tiangolo/full-stack-fastapi-postgresql/issues/104 .

I tried a few benchmarks, with:

I found out that - as seen by others - Depends doesn't work under load, it just exhausts the connections. So I'll leave out this solution. SessionManager works, but i found out an (apparently) more performant solution:

SessionManager solution (only relevant parts):

# models.py
engine = create_engine(
    os.getenv('DATABASE_URL'), pool_recycle=3600, pool_size=5)

SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

# SQLAlchemy model definitions....

class SessionManager:
    def __init__(self):
        self.db = SessionLocal()

    def __enter__(self):
        return self.db

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.db.close()

# api.py
from models import SessionManager

@app.post("/bla")
def create_bla(bla: BlaModel):
    with SessionManager() as conn:
        # use conn with SQLAlchemy models - insert

@app.get("/bla")
def index_bla():
    with SessionManager() as conn:
         # use conn with SQLAlchemy models - select

scoped_session solution (only relevant parts):

# models.py
from sqlalchemy.orm import sessionmaker, scoped_session

# SQLAlchemy model definitions....

def database_setup():
    engine = create_engine(
        os.getenv('DATABASE_URL'), pool_recycle=3600, pool_size=5)
    Base.metadata.bind = engine
    return scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))

# api.py
from models import database_setup

conn = database_setup()

@app.post("/bla")
def create_bla(bla: BlaModel):
    # directly use conn with SQLAlchemy models - insert

@app.get("/bla")
def index_bla():
    # directly use conn with SQLAlchemy models - select

Results: Doing some tests with jmeter, the second implementation seems faster, for example, with 20 concurrent threads in jmeter (left: SessionMaker, right: scopedsession):

"throughput" : 46.987912232295365, vs "throughput" : 49.83682907230358,
"meanResTime" : 811.7986717930802, vs "meanResTime" : 765.7788746298123,

My question is: is the approach with scopedsession correct or is there some flaw that i can't see? I didn't get any connection errors during tests even with 100 threads. I am not a fastapi expert (or an expert python programmer), so if there is some silly error, I would be happy to know! Thanks in advance!

sm-Fifteen commented 3 years ago

A couple of things i want to point out:

You can't really handle more requests at any given time than the amount of connections your thread pool can give you (15 conns for 100 requests means you have to start with 85 requests queued up), nor can you have more sync requests waiting than you have threads in your thread pool (if you have a 4 core + 4 "logical threads" machine, that's 12 threads in your thread pool).

meddario commented 3 years ago

Hi, @sm-Fifteen, thank you very much for the additional information (I am not a Fastapi expert, I was just trying to make sense of these results).

So, basically:

Thanks and kind regards!

sm-Fifteen commented 3 years ago

So, basically:

* Using `sessionmaker` is correct as long as everything is sync

Given how FastAPI works (with dependencies being injected on demand, so ideally wou wouldn't have a whole lof of stuff existing outside of functions, which is sessionmaker's case), I would say that using Session directly is better regardless of whether your requests are sync or async. Again, see my comment on https://github.com/tiangolo/fastapi/issues/726#issuecomment-584371305.

* If I go over 32 simultaneous connections with my last example, I will get an error because there wouldn't be available threads to keep the requests waiting?

No, if you go over that limit, the async thread will most likely stall and pause handling incoming connections while waiting for a thread to be freed. If your application is behind a reverse proxy, that would likely cause you to run into 502 errors (bad gateway) after a few seconds as the proxy tried contecting the application and the application didn't even "pickup the phone" (if you want a more visual metaphor), which is what usually happens when the event loop blocks.

meddario commented 3 years ago

Given how FastAPI works (with dependencies being injected on demand, so ideally wou wouldn't have a whole lof of stuff existing outside of functions, which is sessionmaker's case), I would say that using Session directly is better regardless of whether your requests are sync or async. Again, see my comment on tiangolo/fastapi#726 (comment).

Thanks, do you have an example of using Session directly? I see this in one of your comments in the thread you linked: https://github.com/tiangolo/fastapi/issues/726#issuecomment-557687526

No, if you go over that limit, the async thread will most likely stall and pause handling incoming connections while waiting for a thread to be freed. If your application is behind a reverse proxy, that would likely cause you to run into 502 errors (bad gateway) after a few seconds as the proxy tried contecting the application and the application didn't even "pickup the phone" (if you want a more visual metaphor), which is what usually happens when the event loop blocks.

Ok, this is clear, thank you!

pikeas commented 3 years ago

Bringing over my comments from https://github.com/tiangolo/fastapi/issues/726:

Starlette discourages on_* handlers, and @sm-Fifteen sessionmaker is a thin wrapper around Session that passes through default arguments, including the engine.

There seems to be a lot of confusion about the best way to setup and teardown DB connections, and whether some of these ways are performance traps.

@tiangolo Could you provide guidance on how to best manage DB lifecycle in consideration of all the various factors (--reload, only opening a session on routes that need it, using a connection pool, cleanly closing/rolling back transactions, etc)?

A few additional questions:

1) Why is autoflush=False included in FastAPI examples? What goes wrong when setting this to true? 2) Assuming that yielding a new session is the recommended method, is there a reason to prefer async def vs def? 3) Is there any reason to prefer Depends(get_db) vs with get_db() as session (or async with)? 4) Should the generator do a session.commit() and except: session.rollback() or should this be explicit in the endpoints or handled implicitly by SQLAlchemy's teardown?

Putting it all together, we have something like this:

from contextlib import contextmanager

from sqlalchemy import create_engine
from sqlalchemy.orm import Session, sessionmaker

from .core import settings

# Does not connect until used, pool_pre_ping applies when requesting a pool connection
engine = create_engine(settings.DB.DSN, pool_pre_ping=True)

_Session = sessionmaker(
    bind=engine,
    autoflush=False,
)

def close_engine():
    # For lifecycle shutdown
    engine.dispose()

@ccontextmanager
def get_db():
    # Explicit type because sessionmaker.__call__ stub is Any
    session: Session = _Session(bind=engine)

    try:
        yield session
        session.commit()
    except:
        session.rollback()
        raise
    finally:
        session.close()

Usage:


def on_shutdown():
    db.close_engine()

app = FastAPI(
    ...,
    # TODO lifespan=... when supported
    on_shutdown=[on_shutdown],
)

@app.get('/foo')
def foo(db: Depends(get_db): ...

@app.get('/foo2')
def foo2():
    with get_db() as session:
        ...

# Or as above but with async
sm-Fifteen commented 3 years ago

Starlette discourages on_* handlers, and @sm-Fifteen sessionmaker is a thin wrapper around Session that passes through default arguments, including the engine.

@pikeas : on_* events aren't discouraged yet to my knowledge, they're now hooks for the lifespan context manager (see encode/starlette#799) which replaces them. As of right now, though, that new feature hasn't been integrated to FastAPI yet (I'm not even sure if it's documented at all in Starlette), which would be the ideal solution for tiangolo/fastapi#617 . In the meantime, unless you want to try using the context manager approach by hand, I'd still reccomend to use the approach from https://github.com/tiangolo/fastapi/issues/726#issuecomment-557687526.

A few additional questions:

2. Assuming that yielding a new session is the recommendation method, is there a reason to prefer `async def` vs `def`?

FastAPI tries to run routes and dependencies that are defined as def functions in a threadpool to avoid blocking, but if you can guarantee that a dependency will never block for any amount of time under any circumstances, it's slightly faster to use async def since that makes it run on the same thread. If you look at my example comment on the other issue, the only such async def I use is this one:

_db_conn: Optional[Database]

async def get_db_conn() -> Database:
    assert _db_conn is not None
    return _db_conn

...because spinning up a thread for such a trivial getter would definitely be wasteful, and it's not calling any function I can't tell for sure wouldn't block.

If that's what you were asking, given how your get_db() calls a lot of potentially blocking session methods, I would advise against using async def.

3. Is there any reason to prefer `Depends(get_db)` vs `with get_db() as session` (or `async with`)?

It means you can actually call the function that takes Depends(get_db) as a default parameter with a different database by passing the object in parameter and not have to change the code anywhere else, which can be useful for things like testing with a mock DB or function reuse.

4. Should the generator do a `session.commit()` and `except: session.rollback()` or should this be explicit in the endpoints or handled implicitly by SQLAlchemy's teardown?

I believe rollback is always implicit if you exit a sesion without comitting, and I personnally prefer explicit commits on the caller's side than doing it implicitly via the context manager because it gives the caller more control on how the transaction flows (and it lets you do quick and dirty smoke tests by running a route that writes to DB and just commenting the part that commits the transaction).

pikeas commented 3 years ago

Thanks for the thorough reply! One quick note about on_*...

Starlette discourages on_* handlers, and @sm-Fifteen sessionmaker is a thin wrapper around Session that passes through default arguments, including the engine.

@pikeas : on_* events aren't discouraged yet to my knowledge, they're now hooks for the lifespan context manager (see encode/starlette#799) which replaces them. As of right now, though, that new feature hasn't been integrated to FastAPI yet (I'm not even sure if it's documented at all in Starlette), which would be the ideal solution for tiangolo/fastapi#617 . In the meantime, unless you want to try using the context manager approach by hand, I'd still reccomend to use the approach from tiangolo/fastapi#726 (comment).

It's actually right in Starlette's code at https://github.com/encode/starlette/blob/master/starlette/applications.py#L114-L115

The following usages are now discouraged in favour of configuration during Starlette.init(...)

sandys commented 3 years ago

@sm-Fifteen is your recommendation here benchmarked ? https://github.com/tiangolo/fastapi/issues/726#issuecomment-557687526

will it work in all situations without lockups, etc. we are planning to use this with postgresql

sm-Fifteen commented 3 years ago

@sm-Fifteen is your recommendation here benchmarked ? tiangolo/fastapi#726 (comment)

I have not run any load-testing benchmarks on it, no. When I posted this comment, it was more of an architectural suggestion to avoid initializing anything in the global scope (outside the ASGI application's scope) to take advantage of dependency injection everywhere.

will it work in all situations without lockups, etc. we are planning to use this with postgresql

@sandys : open_database_connection_pools and close_database_connection_pools are blocking, but run outside the normal flow of your server, async get_db_conn is just a "scoped singleton" getter so it will not block. You may want to check get_db_sess, since I'm not sure whether or not the Session constructor blocks or not, and sess.close() probably does (at least when it needs to rollback), so it would probably be safer to mark this as def rather than async def.

Like I mentionned in https://github.com/tiangolo/full-stack-fastapi-postgresql/issues/290#issuecomment-723155540, another thing you'll want to check when testing for performance is whether FastAPI's threadpool ends up filling itself up when trying to offload the main thread of all the non-async defs you're running, which may also run you into performance problems, though I'm saying that from having looked at the code, not from having explicitly tested that case myself.

It's actually right in Starlette's code at https://github.com/encode/starlette/blob/master/starlette/applications.py#L114-L115

The following usages are now discouraged in favour of configuration during Starlette.init(...)

@pikeas: Ah, that's regarding the declarative API Starlette added in 0.13 (see encode/starlette#704), which was agreed to be a poor fit for FastAPI when the question of supporting that was brought up in tiangolo/fastapi#687 (I think the API is great, it's just that it wouldn't mesh very well with FastAPI's very function-oriented style, see my comment in that thread). The best approach in the future would be the context manager approach I mentionned earlier, which has been implemented in Starlette, but still needs support to be added on the FastAPI side of things.

sandys commented 3 years ago

@sm-Fifteen thanks for this! your comment here is pure gold ... as well as the previous one https://github.com/tiangolo/full-stack-fastapi-postgresql/issues/290#issuecomment-723155540 Its the first discussion of production performance tuning of fastapi!

would you be willing to add some color to this ? i know it may be asking too much, but how to turn the knobs would be very welcome (and there seem to be quite a few of them)

sm-Fifteen commented 3 years ago

@sandys: Unfortunately, that's about the limit of what I know regarding FastAPI performance. You can probably further reduce blocking by starting uvicorn with --workers=4 or some other number to spawn multiple processes of your application, but that's not something I've tested myself either, I'm just aware of it.

FastAPI could really use a "Performance" section in its documentation, you might want to mention that in the FastAPI developper survey if you haven't taken it yet.

sandys commented 3 years ago

@sm-Fifteen I have a new way to do it instead of global sqlalchemy.engine - to use lru_cache and contextmanager.

https://gist.github.com/sandys/671b8b86ba913e6436d4cb22d04b135f

do you think it works just as well ?

sm-Fifteen commented 3 years ago

@lru_cache() on a function with no parameters only creates the database engine as a (lazily initialized) global singleton, which is basically equivalent to just running create_engine in the global scope. The main difference between the two is that the first request calling get_engine() will have to wait for the connection to be established, as that step is no longer performed before the FastAPI application goes up.

The reason I was recommending on_event() lifecycle managers earlier is because these make sure that your SQL Alchemy engine is initialized before the application runs, but also that it's managed with the lifecycle of the application, which enables your ASGI server life Uvicorn to run proper cleanup of both sync and async ressources before exiting, instead of assuming that it will happen automatically (which it might not, Python tries to run garbage collection before exiting but this is not perfect, and uvicorn's --reload option causes means your code gets reloaded without exiting from the python interpreter first). lru_cache doesn't really resolve that, it merely makes sure that get_engine() only runs once.

With that said, the exact way you initialize your DB engine isn't how you're going to improve performance, I was pointing that out more for the sake of correctness.

sandys commented 3 years ago

Thank you. That makes total sense.

However what is your opinion on the contextmanager stuff ?

On Wed, 17 Mar, 2021, 20:44 sm-Fifteen, @.***> wrote:

@lru_cache() on a function with no parameters only creates the database engine as a (lazily initialized) global singleton, which is basically equivalent to just running create_engine in the global scope. The main difference between the two is that the first request calling get_engine() will have to wait for the connection to be established, as that step is no longer performed before the FastAPI application goes up.

The reason I was recommending on_event() lifecycle managers earlier is because these make sure that your SQL Alchemy engine is initialized before the application runs, but also that it's managed with the lifecycle of the application, which enables your ASGI server life Uvicorn to run proper cleanup of both sync and async ressources before exiting, instead of assuming that it will happen automatically (which it might not, Python tries to run garbage collection before exiting but this is not perfect https://github.com/tiangolo/fastapi/issues/617#issuecomment-562307021, and uvicorn's --reload option causes means your code gets reloaded without exiting from the python interpreter first). lru_cache doesn't really resolve that, it merely makes sure that get_engine() only runs once.

With that said, the exact way you initialize your DB engine isn't how you're going to improve performance, I was pointing that out more for the sake of correctness.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tiangolo/full-stack-fastapi-postgresql/issues/290#issuecomment-801165838, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASYU3TULO6I34VOGVJPIDTEDBNTANCNFSM4SIR67IA .

sm-Fifteen commented 3 years ago

However what is your opinion on the contextmanager stuff ?

It's a documented feature of FastAPI and generally a pretty good pattern to adopt for request-scoped resources.

However:

bharathanr commented 2 years ago

A couple of things i want to point out:

  • poll_size has a default value of 5 connections it will keep open at all times, and max_overflow has a default value of 10 connections it can open on demand beyond the pool's capacity, so your SQLAlchemy engine runs out of connections to use for serving requests after 15 and will block the thread of any other requests that need one while waiting for connections to be freed.
  • The ThreadPoolExecutor fastapi is using to run non-async routes and dependencies without blocking the main thread (the one handling connections and running async code) has a capacity equal to the amount of cores (both logical and physical) you have plus 4, maxing out at 32, and since all your functions are def and not async def (they kind of have to be, since SQLAlchemy won't get async support until version 1.4, and making blocking functions async def would only serve to block your event loop and stall your whole application instead of blocking one thread in your threadpool), each of these requests will occupy one thread in this thread pool for the entirety of its lifetime, so it may have trouble handling more requests than however many threads your ThreadPoolExecutor has.
  • SqlAlchemy's sessionmaker works using thread-local storage to keep track of sessions, which will work when using FastAPI in a fully sync way, but might cause you to run into issues when async gets thrown in because requests and threads aren't mapped 1:1 in FastAPI, which is why I usually reccomend using Session(bind=db_conn) directly with a setup like you have.

You can't really handle more requests at any given time than the amount of connections your thread pool can give you (15 conns for 100 requests means you have to start with 85 requests queued up), nor can you have more sync requests waiting than you have threads in your thread pool (if you have a 4 core + 4 "logical threads" machine, that's 12 threads in your thread pool).

@sm-Fifteen now that SQLAlchemy has async support, I was wondering if Sessionmaker still uses thread locals when creating AsyncSession objects. I'd hope not, at least. Has your recommendation around using Session(bind=db_conn) in favour of Sessionmaker changed at all for async sessions? I'm asking this question under the assumption that:

  1. I do not have a global engine, and I instead tie it to app.state
  2. I do not use a global Sessionmaker, but use some kind of caching to ensure it is not created more than once

What my question amounts to, is whether you still recommend following your approach from tiangolo/fastapi#726 and your comment in this issue on using Session directly instead of sessoinmaker still apply today, if I"m doing the stuff I mentioned above.

Edit: Added some more references, and a summary.

sm-Fifteen commented 2 years ago

@sm-Fifteen now that SQLAlchemy has async support, I was wondering if Sessionmaker still uses thread locals when creating AsyncSession objects. I'd hope not, at least. Has your recommendation around using Session(bind=db_conn) in favour of Sessionmaker changed at all for async sessions? I'm asking this question under the assumption that:

1. I do not have a global engine, and I instead tie it to app.state

2. I do not use a global Sessionmaker, but use some kind of caching to ensure it is not created more than once

What my question amounts to, is whether you still recommend following your approach from tiangolo/fastapi#726 and your comment in this issue on using Session directly instead of sessoinmaker still apply today, if I"m doing the stuff I mentioned above.

Edit: Added some more references, and a summary.

@bharathanr: sessionmaker's main use case seems to be pre-configuring an alternate Session() constructors so you don't need to repeat that configuration elsewhere in the code. That wouldn't really be needed if you're using dependency injection for that, I don't think, so I would stick to recommending creating a Session directly inside your session dependency. Also, the documentation for scoped_session now actively discourages using it with async frameworks, so definitely don't use that.

oliviernguyenquoc commented 2 years ago

Hi !

I don't get it completely. Is this repo's session setup is the way to go ? Is there still performance issues ?

I see that https://github.com/tiangolo/fastapi/issues/726 issue is closed.

I had to use FastAPI-utils get rid of the famous error: sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 0 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r)

But I got some trouble to bind declarative_base with the engine.

If this repo still have performance issues, is there a way to make a application without this limitation thanks to FastAPI doc or must we wait for update on FastAPI side ?

sm-Fifteen commented 2 years ago

@oliviernguyenquoc: You can't make an application completely free of that sort of limitations, since you'll always have a connection limit somewhere (if not on SQLAlchemy, then Postgres' default of 150 open connections, and if you removed that then said database will run out memory opening threads to handle connections if you keep pushing it). It's just a limited resource, just like anything.

The problem that's new to FastAPI is how its specific mix of running async on the main thread and running sync in a thread pool, which runs into issues that frameworks using the more traditional "one thread per request" model like Flask never needed to deal with (Armin Ronacher has a pretty good write up on that), namely that blocking I/O or processing anywhere you didn't expect it (such as checking out a connection from a connection pool, or maybe even closing it) will completely kill your server's performance (but then any blocking I/O needs to be done in the thread pool), and also that the async loop being unaware of the thread pool's size means it can keep accepting connections even when the thread pool cannot (which will also kill your server's performance). This introduces new issues like tiangolo/fastapi#3205, which is about how a mismatch between your connection pool and your thread pool can lead to deadlocks when using SQLAlchemy in sync mode (async mode seems to solve this).

I see that https://github.com/tiangolo/fastapi/issues/726 issue is closed.

I had to use FastAPI-utils get rid of the famous error:

But I got some trouble to bind declarative_base with the engine.

I would still recommend the configuration from tiangolo/fastapi#726. You also don't need to bind your declarative_base objects to a SQL Alchemy engine or an existing Metadata object for them to work, you can just create an unbound declarative base.

from sqlalchemy.ext.declarative import declarative_base

MyDBBase = declarative_base(name='MyDBBase')

class FooDBModel(MyDBBase):
    __tablename__ = 'foo'
    __table_args__ = {'schema':'example'}
    foo_id = Column(Integer, primary_key=True)

The engine is only needed when doing create_all() or manipulating the ORM data, the later of which is done through a Session, which you can pass as dependencies in your FastAPI routes.

harsh8398 commented 2 years ago

Adding to this discussion:

I have performed some tests with synchronous SQLAlchemy session and have documented my findings in this repo: https://github.com/harsh8398/fastapi-dependency-issue

I haven't tested with the async SQLAlchemy session but from my tests, it seems that the problem is within how fastapi implements Depends .

Summary of my load testing: Using Depends average requests throughput per second is a single digit with some failed requests. Whereas without depends it's more than a thousand requests per second with zero failed requests.

sandys commented 2 years ago

@sm-Fifteen in ur comment in the other bug, u mentioned sqlalchemy 1.4 as one of the solutions to this issue.

In that context, and given SA 1.4 is stable, what would be the recommended configuration for fastapi now ?

sm-Fifteen commented 2 years ago

I have performed some tests with synchronous SQLAlchemy session and have documented my findings in this repo: https://github.com/harsh8398/fastapi-dependency-issue

I haven't tested with the async SQLAlchemy session but from my tests, it seems that the problem is within how fastapi implements Depends .

@harsh8398: Those are some pretty troubling numbers, I wasn't aware that this cocktail of CM dependencies, thread pools and blocking I/O could kill server performance this badly. That certainly makes a convincing argument against using sync SQLAlchemy in FastAPI application, and makes the discussion in tiangolo/fastapi#3205 extra important. I was going to say that "the docs are luckily unaffected by this as they still keep their SQLA objects in the global namespace", but on closer look, the docs do use CM deps to initialize ORM sessions, and haven't been updated to use async SQLA, so this problem also affects official documentation.

In that context, and given SA 1.4 is stable, what would be the recommended configuration for fastapi now ?

@sandys: I haven't had the opportunity to try async SQLA on the FastAPI applications I'm working on at the moment, so I can't say.

adriangb commented 2 years ago

https://github.com/harsh8398/fastapi-dependency-issue/pull/2

The issue seems to be hitting the default capacity limiter for the threadpool used to run sync dependencies/endpoints.

sandys commented 2 years ago

@adriangb on the other hand, i want to submit my code

https://gist.github.com/sandys/671b8b86ba913e6436d4cb22d04b135f

with a single gunicorn worker, im getting the following performance.

 docker run --network=host --rm skandyla/wrk -t12 -c400 -d30s http://localhost:8000/users/
Running 30s test @ http://localhost:8000/users/
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   161.88ms   64.40ms   1.93s    88.77%
    Req/Sec    49.64     29.37   131.00     49.80%
  8700 requests in 30.09s, 2.17MB read
  Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec:    289.16
Transfer/sec:     73.70KB

P.S. this uses Depends.

Also the endpoint is actually a database insert. not just a database query.

adriangb commented 2 years ago

There's way too much code in there for me to go through. I'm also not sure why you're tagging me?