Open meddario opened 4 years ago
A couple of things i want to point out:
def
and not async def
(they kind of have to be, since SQLAlchemy won't get async support until version 1.4, and making blocking functions async def would only serve to block your event loop and stall your whole application instead of blocking one thread in your threadpool), each of these requests will occupy one thread in this thread pool for the entirety of its lifetime, so it may have trouble handling more requests than however many threads your ThreadPoolExecutor has.sessionmaker
works using thread-local storage to keep track of sessions, which will work when using FastAPI in a fully sync way, but might cause you to run into issues when async gets thrown in because requests and threads aren't mapped 1:1 in FastAPI, which is why I usually reccomend using Session(bind=db_conn)
directly with a setup like you have.You can't really handle more requests at any given time than the amount of connections your thread pool can give you (15 conns for 100 requests means you have to start with 85 requests queued up), nor can you have more sync requests waiting than you have threads in your thread pool (if you have a 4 core + 4 "logical threads" machine, that's 12 threads in your thread pool).
Hi, @sm-Fifteen, thank you very much for the additional information (I am not a Fastapi expert, I was just trying to make sense of these results).
So, basically:
sessionmaker
is correct as long as everything is syncThanks and kind regards!
So, basically:
* Using `sessionmaker` is correct as long as everything is sync
Given how FastAPI works (with dependencies being injected on demand, so ideally wou wouldn't have a whole lof of stuff existing outside of functions, which is sessionmaker
's case), I would say that using Session directly is better regardless of whether your requests are sync or async. Again, see my comment on https://github.com/tiangolo/fastapi/issues/726#issuecomment-584371305.
* If I go over 32 simultaneous connections with my last example, I will get an error because there wouldn't be available threads to keep the requests waiting?
No, if you go over that limit, the async thread will most likely stall and pause handling incoming connections while waiting for a thread to be freed. If your application is behind a reverse proxy, that would likely cause you to run into 502 errors (bad gateway) after a few seconds as the proxy tried contecting the application and the application didn't even "pickup the phone" (if you want a more visual metaphor), which is what usually happens when the event loop blocks.
Given how FastAPI works (with dependencies being injected on demand, so ideally wou wouldn't have a whole lof of stuff existing outside of functions, which is
sessionmaker
's case), I would say that using Session directly is better regardless of whether your requests are sync or async. Again, see my comment on tiangolo/fastapi#726 (comment).Thanks, do you have an example of using Session directly? I see this in one of your comments in the thread you linked: https://github.com/tiangolo/fastapi/issues/726#issuecomment-557687526
No, if you go over that limit, the async thread will most likely stall and pause handling incoming connections while waiting for a thread to be freed. If your application is behind a reverse proxy, that would likely cause you to run into 502 errors (bad gateway) after a few seconds as the proxy tried contecting the application and the application didn't even "pickup the phone" (if you want a more visual metaphor), which is what usually happens when the event loop blocks.
Ok, this is clear, thank you!
Bringing over my comments from https://github.com/tiangolo/fastapi/issues/726:
Starlette discourages on_*
handlers, and @sm-Fifteen sessionmaker
is a thin wrapper around Session
that passes through default arguments, including the engine.
There seems to be a lot of confusion about the best way to setup and teardown DB connections, and whether some of these ways are performance traps.
@tiangolo Could you provide guidance on how to best manage DB lifecycle in consideration of all the various factors (--reload
, only opening a session on routes that need it, using a connection pool, cleanly closing/rolling back transactions, etc)?
A few additional questions:
1) Why is autoflush=False
included in FastAPI examples? What goes wrong when setting this to true?
2) Assuming that yielding a new session is the recommended method, is there a reason to prefer async def
vs def
?
3) Is there any reason to prefer Depends(get_db)
vs with get_db() as session
(or async with
)?
4) Should the generator do a session.commit()
and except: session.rollback()
or should this be explicit in the endpoints or handled implicitly by SQLAlchemy's teardown?
Putting it all together, we have something like this:
from contextlib import contextmanager
from sqlalchemy import create_engine
from sqlalchemy.orm import Session, sessionmaker
from .core import settings
# Does not connect until used, pool_pre_ping applies when requesting a pool connection
engine = create_engine(settings.DB.DSN, pool_pre_ping=True)
_Session = sessionmaker(
bind=engine,
autoflush=False,
)
def close_engine():
# For lifecycle shutdown
engine.dispose()
@ccontextmanager
def get_db():
# Explicit type because sessionmaker.__call__ stub is Any
session: Session = _Session(bind=engine)
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
Usage:
def on_shutdown():
db.close_engine()
app = FastAPI(
...,
# TODO lifespan=... when supported
on_shutdown=[on_shutdown],
)
@app.get('/foo')
def foo(db: Depends(get_db): ...
@app.get('/foo2')
def foo2():
with get_db() as session:
...
# Or as above but with async
Starlette discourages
on_*
handlers, and @sm-Fifteensessionmaker
is a thin wrapper aroundSession
that passes through default arguments, including the engine.
@pikeas : on_*
events aren't discouraged yet to my knowledge, they're now hooks for the lifespan
context manager (see encode/starlette#799) which replaces them. As of right now, though, that new feature hasn't been integrated to FastAPI yet (I'm not even sure if it's documented at all in Starlette), which would be the ideal solution for tiangolo/fastapi#617 . In the meantime, unless you want to try using the context manager approach by hand, I'd still reccomend to use the approach from https://github.com/tiangolo/fastapi/issues/726#issuecomment-557687526.
A few additional questions:
2. Assuming that yielding a new session is the recommendation method, is there a reason to prefer `async def` vs `def`?
FastAPI tries to run routes and dependencies that are defined as def
functions in a threadpool to avoid blocking, but if you can guarantee that a dependency will never block for any amount of time under any circumstances, it's slightly faster to use async def
since that makes it run on the same thread. If you look at my example comment on the other issue, the only such async def
I use is this one:
_db_conn: Optional[Database]
async def get_db_conn() -> Database:
assert _db_conn is not None
return _db_conn
...because spinning up a thread for such a trivial getter would definitely be wasteful, and it's not calling any function I can't tell for sure wouldn't block.
If that's what you were asking, given how your get_db()
calls a lot of potentially blocking session methods, I would advise against using async def
.
3. Is there any reason to prefer `Depends(get_db)` vs `with get_db() as session` (or `async with`)?
It means you can actually call the function that takes Depends(get_db)
as a default parameter with a different database by passing the object in parameter and not have to change the code anywhere else, which can be useful for things like testing with a mock DB or function reuse.
4. Should the generator do a `session.commit()` and `except: session.rollback()` or should this be explicit in the endpoints or handled implicitly by SQLAlchemy's teardown?
I believe rollback is always implicit if you exit a sesion without comitting, and I personnally prefer explicit commits on the caller's side than doing it implicitly via the context manager because it gives the caller more control on how the transaction flows (and it lets you do quick and dirty smoke tests by running a route that writes to DB and just commenting the part that commits the transaction).
Thanks for the thorough reply! One quick note about on_*
...
Starlette discourages
on_*
handlers, and @sm-Fifteensessionmaker
is a thin wrapper aroundSession
that passes through default arguments, including the engine.@pikeas :
on_*
events aren't discouraged yet to my knowledge, they're now hooks for thelifespan
context manager (see encode/starlette#799) which replaces them. As of right now, though, that new feature hasn't been integrated to FastAPI yet (I'm not even sure if it's documented at all in Starlette), which would be the ideal solution for tiangolo/fastapi#617 . In the meantime, unless you want to try using the context manager approach by hand, I'd still reccomend to use the approach from tiangolo/fastapi#726 (comment).
It's actually right in Starlette's code at https://github.com/encode/starlette/blob/master/starlette/applications.py#L114-L115
The following usages are now discouraged in favour of configuration during Starlette.init(...)
@sm-Fifteen is your recommendation here benchmarked ? https://github.com/tiangolo/fastapi/issues/726#issuecomment-557687526
will it work in all situations without lockups, etc. we are planning to use this with postgresql
@sm-Fifteen is your recommendation here benchmarked ? tiangolo/fastapi#726 (comment)
I have not run any load-testing benchmarks on it, no. When I posted this comment, it was more of an architectural suggestion to avoid initializing anything in the global scope (outside the ASGI application's scope) to take advantage of dependency injection everywhere.
will it work in all situations without lockups, etc. we are planning to use this with postgresql
@sandys : open_database_connection_pools
and close_database_connection_pools
are blocking, but run outside the normal flow of your server, async get_db_conn
is just a "scoped singleton" getter so it will not block. You may want to check get_db_sess
, since I'm not sure whether or not the Session
constructor blocks or not, and sess.close()
probably does (at least when it needs to rollback), so it would probably be safer to mark this as def
rather than async def
.
Like I mentionned in https://github.com/tiangolo/full-stack-fastapi-postgresql/issues/290#issuecomment-723155540, another thing you'll want to check when testing for performance is whether FastAPI's threadpool ends up filling itself up when trying to offload the main thread of all the non-async def
s you're running, which may also run you into performance problems, though I'm saying that from having looked at the code, not from having explicitly tested that case myself.
It's actually right in Starlette's code at https://github.com/encode/starlette/blob/master/starlette/applications.py#L114-L115
The following usages are now discouraged in favour of configuration during Starlette.init(...)
@pikeas: Ah, that's regarding the declarative API Starlette added in 0.13 (see encode/starlette#704), which was agreed to be a poor fit for FastAPI when the question of supporting that was brought up in tiangolo/fastapi#687 (I think the API is great, it's just that it wouldn't mesh very well with FastAPI's very function-oriented style, see my comment in that thread). The best approach in the future would be the context manager approach I mentionned earlier, which has been implemented in Starlette, but still needs support to be added on the FastAPI side of things.
@sm-Fifteen thanks for this! your comment here is pure gold ... as well as the previous one https://github.com/tiangolo/full-stack-fastapi-postgresql/issues/290#issuecomment-723155540 Its the first discussion of production performance tuning of fastapi!
would you be willing to add some color to this ? i know it may be asking too much, but how to turn the knobs would be very welcome (and there seem to be quite a few of them)
@sandys: Unfortunately, that's about the limit of what I know regarding FastAPI performance. You can probably further reduce blocking by starting uvicorn with --workers=4
or some other number to spawn multiple processes of your application, but that's not something I've tested myself either, I'm just aware of it.
FastAPI could really use a "Performance" section in its documentation, you might want to mention that in the FastAPI developper survey if you haven't taken it yet.
@sm-Fifteen I have a new way to do it instead of global sqlalchemy.engine - to use lru_cache and contextmanager.
https://gist.github.com/sandys/671b8b86ba913e6436d4cb22d04b135f
do you think it works just as well ?
@lru_cache()
on a function with no parameters only creates the database engine as a (lazily initialized) global singleton, which is basically equivalent to just running create_engine
in the global scope. The main difference between the two is that the first request calling get_engine()
will have to wait for the connection to be established, as that step is no longer performed before the FastAPI application goes up.
The reason I was recommending on_event()
lifecycle managers earlier is because these make sure that your SQL Alchemy engine is initialized before the application runs, but also that it's managed with the lifecycle of the application, which enables your ASGI server life Uvicorn to run proper cleanup of both sync and async ressources before exiting, instead of assuming that it will happen automatically (which it might not, Python tries to run garbage collection before exiting but this is not perfect, and uvicorn's --reload
option causes means your code gets reloaded without exiting from the python interpreter first). lru_cache
doesn't really resolve that, it merely makes sure that get_engine()
only runs once.
With that said, the exact way you initialize your DB engine isn't how you're going to improve performance, I was pointing that out more for the sake of correctness.
Thank you. That makes total sense.
However what is your opinion on the contextmanager stuff ?
On Wed, 17 Mar, 2021, 20:44 sm-Fifteen, @.***> wrote:
@lru_cache() on a function with no parameters only creates the database engine as a (lazily initialized) global singleton, which is basically equivalent to just running create_engine in the global scope. The main difference between the two is that the first request calling get_engine() will have to wait for the connection to be established, as that step is no longer performed before the FastAPI application goes up.
The reason I was recommending on_event() lifecycle managers earlier is because these make sure that your SQL Alchemy engine is initialized before the application runs, but also that it's managed with the lifecycle of the application, which enables your ASGI server life Uvicorn to run proper cleanup of both sync and async ressources before exiting, instead of assuming that it will happen automatically (which it might not, Python tries to run garbage collection before exiting but this is not perfect https://github.com/tiangolo/fastapi/issues/617#issuecomment-562307021, and uvicorn's --reload option causes means your code gets reloaded without exiting from the python interpreter first). lru_cache doesn't really resolve that, it merely makes sure that get_engine() only runs once.
With that said, the exact way you initialize your DB engine isn't how you're going to improve performance, I was pointing that out more for the sake of correctness.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tiangolo/full-stack-fastapi-postgresql/issues/290#issuecomment-801165838, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASYU3TULO6I34VOGVJPIDTEDBNTANCNFSM4SIR67IA .
However what is your opinion on the contextmanager stuff ?
It's a documented feature of FastAPI and generally a pretty good pattern to adopt for request-scoped resources.
However:
@contextmanager
annotation, FastAPI can handle them just fine without.sess.commit()
at the end of your route function and keep get_db()
as a simple try-yield-finally
like in the doc.A couple of things i want to point out:
- poll_size has a default value of 5 connections it will keep open at all times, and max_overflow has a default value of 10 connections it can open on demand beyond the pool's capacity, so your SQLAlchemy engine runs out of connections to use for serving requests after 15 and will block the thread of any other requests that need one while waiting for connections to be freed.
- The ThreadPoolExecutor fastapi is using to run non-async routes and dependencies without blocking the main thread (the one handling connections and running async code) has a capacity equal to the amount of cores (both logical and physical) you have plus 4, maxing out at 32, and since all your functions are
def
and notasync def
(they kind of have to be, since SQLAlchemy won't get async support until version 1.4, and making blocking functions async def would only serve to block your event loop and stall your whole application instead of blocking one thread in your threadpool), each of these requests will occupy one thread in this thread pool for the entirety of its lifetime, so it may have trouble handling more requests than however many threads your ThreadPoolExecutor has.- SqlAlchemy's
sessionmaker
works using thread-local storage to keep track of sessions, which will work when using FastAPI in a fully sync way, but might cause you to run into issues when async gets thrown in because requests and threads aren't mapped 1:1 in FastAPI, which is why I usually reccomend usingSession(bind=db_conn)
directly with a setup like you have.You can't really handle more requests at any given time than the amount of connections your thread pool can give you (15 conns for 100 requests means you have to start with 85 requests queued up), nor can you have more sync requests waiting than you have threads in your thread pool (if you have a 4 core + 4 "logical threads" machine, that's 12 threads in your thread pool).
@sm-Fifteen now that SQLAlchemy has async support, I was wondering if Sessionmaker still uses thread locals when creating AsyncSession objects. I'd hope not, at least. Has your recommendation around using Session(bind=db_conn)
in favour of Sessionmaker
changed at all for async sessions? I'm asking this question under the assumption that:
What my question amounts to, is whether you still recommend following your approach from tiangolo/fastapi#726 and your comment in this issue on using Session directly instead of sessoinmaker still apply today, if I"m doing the stuff I mentioned above.
Edit: Added some more references, and a summary.
@sm-Fifteen now that SQLAlchemy has async support, I was wondering if Sessionmaker still uses thread locals when creating AsyncSession objects. I'd hope not, at least. Has your recommendation around using
Session(bind=db_conn)
in favour ofSessionmaker
changed at all for async sessions? I'm asking this question under the assumption that:1. I do not have a global engine, and I instead tie it to app.state 2. I do not use a global Sessionmaker, but use some kind of caching to ensure it is not created more than once
What my question amounts to, is whether you still recommend following your approach from tiangolo/fastapi#726 and your comment in this issue on using Session directly instead of sessoinmaker still apply today, if I"m doing the stuff I mentioned above.
Edit: Added some more references, and a summary.
@bharathanr: sessionmaker
's main use case seems to be pre-configuring an alternate Session()
constructors so you don't need to repeat that configuration elsewhere in the code. That wouldn't really be needed if you're using dependency injection for that, I don't think, so I would stick to recommending creating a Session directly inside your session dependency. Also, the documentation for scoped_session
now actively discourages using it with async frameworks, so definitely don't use that.
Hi !
I don't get it completely. Is this repo's session setup is the way to go ? Is there still performance issues ?
I see that https://github.com/tiangolo/fastapi/issues/726 issue is closed.
I had to use FastAPI-utils get rid of the famous error:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 0 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r)
But I got some trouble to bind declarative_base with the engine.
If this repo still have performance issues, is there a way to make a application without this limitation thanks to FastAPI doc or must we wait for update on FastAPI side ?
@oliviernguyenquoc: You can't make an application completely free of that sort of limitations, since you'll always have a connection limit somewhere (if not on SQLAlchemy, then Postgres' default of 150 open connections, and if you removed that then said database will run out memory opening threads to handle connections if you keep pushing it). It's just a limited resource, just like anything.
The problem that's new to FastAPI is how its specific mix of running async on the main thread and running sync in a thread pool, which runs into issues that frameworks using the more traditional "one thread per request" model like Flask never needed to deal with (Armin Ronacher has a pretty good write up on that), namely that blocking I/O or processing anywhere you didn't expect it (such as checking out a connection from a connection pool, or maybe even closing it) will completely kill your server's performance (but then any blocking I/O needs to be done in the thread pool), and also that the async loop being unaware of the thread pool's size means it can keep accepting connections even when the thread pool cannot (which will also kill your server's performance). This introduces new issues like tiangolo/fastapi#3205, which is about how a mismatch between your connection pool and your thread pool can lead to deadlocks when using SQLAlchemy in sync mode (async mode seems to solve this).
I see that https://github.com/tiangolo/fastapi/issues/726 issue is closed.
I had to use FastAPI-utils get rid of the famous error:
But I got some trouble to bind declarative_base with the engine.
I would still recommend the configuration from tiangolo/fastapi#726. You also don't need to bind your declarative_base objects to a SQL Alchemy engine or an existing Metadata object for them to work, you can just create an unbound declarative base.
from sqlalchemy.ext.declarative import declarative_base
MyDBBase = declarative_base(name='MyDBBase')
class FooDBModel(MyDBBase):
__tablename__ = 'foo'
__table_args__ = {'schema':'example'}
foo_id = Column(Integer, primary_key=True)
The engine is only needed when doing create_all()
or manipulating the ORM data, the later of which is done through a Session, which you can pass as dependencies in your FastAPI routes.
Adding to this discussion:
I have performed some tests with synchronous SQLAlchemy session and have documented my findings in this repo: https://github.com/harsh8398/fastapi-dependency-issue
I haven't tested with the async SQLAlchemy session but from my tests, it seems that the problem is within how fastapi implements Depends
.
Summary of my load testing:
Using Depends
average requests throughput per second is a single digit with some failed requests. Whereas without depends it's more than a thousand requests per second with zero failed requests.
@sm-Fifteen in ur comment in the other bug, u mentioned sqlalchemy 1.4 as one of the solutions to this issue.
In that context, and given SA 1.4 is stable, what would be the recommended configuration for fastapi now ?
I have performed some tests with synchronous SQLAlchemy session and have documented my findings in this repo: https://github.com/harsh8398/fastapi-dependency-issue
I haven't tested with the async SQLAlchemy session but from my tests, it seems that the problem is within how fastapi implements
Depends
.
@harsh8398: Those are some pretty troubling numbers, I wasn't aware that this cocktail of CM dependencies, thread pools and blocking I/O could kill server performance this badly. That certainly makes a convincing argument against using sync SQLAlchemy in FastAPI application, and makes the discussion in tiangolo/fastapi#3205 extra important. I was going to say that "the docs are luckily unaffected by this as they still keep their SQLA objects in the global namespace", but on closer look, the docs do use CM deps to initialize ORM sessions, and haven't been updated to use async SQLA, so this problem also affects official documentation.
In that context, and given SA 1.4 is stable, what would be the recommended configuration for fastapi now ?
@sandys: I haven't had the opportunity to try async SQLA on the FastAPI applications I'm working on at the moment, so I can't say.
https://github.com/harsh8398/fastapi-dependency-issue/pull/2
The issue seems to be hitting the default capacity limiter for the threadpool used to run sync dependencies/endpoints.
@adriangb on the other hand, i want to submit my code
https://gist.github.com/sandys/671b8b86ba913e6436d4cb22d04b135f
with a single gunicorn worker, im getting the following performance.
docker run --network=host --rm skandyla/wrk -t12 -c400 -d30s http://localhost:8000/users/
Running 30s test @ http://localhost:8000/users/
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 161.88ms 64.40ms 1.93s 88.77%
Req/Sec 49.64 29.37 131.00 49.80%
8700 requests in 30.09s, 2.17MB read
Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec: 289.16
Transfer/sec: 73.70KB
P.S. this uses Depends.
Also the endpoint is actually a database insert. not just a database query.
There's way too much code in there for me to go through. I'm also not sure why you're tagging me?
Hi guys,
this is based on the discussion here: https://github.com/tiangolo/full-stack-fastapi-postgresql/issues/104 .
I tried a few benchmarks, with:
scoped_session
that I'll describeI found out that - as seen by others - Depends doesn't work under load, it just exhausts the connections. So I'll leave out this solution. SessionManager works, but i found out an (apparently) more performant solution:
SessionManager
solution (only relevant parts):scoped_session
solution (only relevant parts):Results: Doing some tests with jmeter, the second implementation seems faster, for example, with 20 concurrent threads in jmeter (left: SessionMaker, right: scopedsession):
My question is: is the approach with
scopedsession
correct or is there some flaw that i can't see? I didn't get any connection errors during tests even with 100 threads. I am not a fastapi expert (or an expert python programmer), so if there is some silly error, I would be happy to know! Thanks in advance!