fastapi / fastapi

FastAPI framework, high performance, easy to learn, fast to code, ready for production
https://fastapi.tiangolo.com/
MIT License
77.28k stars 6.61k forks source link

Further develop startup and shutdown events #617

Open sm-Fifteen opened 5 years ago

sm-Fifteen commented 5 years ago

While the documentationn for FastAPI is in general extremely solid, there's a weakpoint that I feel hints at some underdevelopped feature within the framework, and that's startup and shutdown events. They are briefly mentionned (separately) with the startup event in particular being demonstrated like this :

items = {}

@app.on_event("startup")
async def startup_event():
    items["foo"] = {"name": "Fighters"}
    items["bar"] = {"name": "Tenders"}

@app.get("/items/{item_id}")
async def read_items(item_id: str):
    return items[item_id]

...which could very well be written like this:

items = {
    "foo": {"name": "Fighters"},
    "bar": {"name": "Tenders"}
}

@app.get("/items/{item_id}")
async def read_items(item_id: str):
    return items[item_id]

...and therefore makes the feature look useless. The example for shutdown instead uses logging as an example, which makes it look like this would be the primary purposes for those events, while in reality, it's not.

Is your feature request related to a problem? Please describe. The problem is that, throughout the entire documentation, things like database connections are created in the global scope, at module import. While this would be fine in a regular Python application, this has a number of problems, especially with objects that have a side-effect outside the code itself, like database connections. To demonstrate this, I've made a test structure that creates a lock file when initialized and deletes it when garbage collected.

Using it like this:

from fastapi import FastAPI
from lock import FileLock

app = FastAPI()
lock = FileLock("fastapi")

@app.get("/")
async def root():
    return {"message": "Hello World"}

...does not work and the lock is not deleted before shutdown (I was actually expecting it to be closed properly, like SQLAlchemy does with its connections, but clearly there's a lot of extra magic going on with SQLAlchemy that I don't even come close to understanding). This is also extremely apparent when using the --reload option on Uvicorn, bcause the lock is also not released when the modules are reloaded, causing the import to fail and the server to crash. This would be one thing, but I've had a similar incident occur some time ago when, while developping in reload mode, I've actually managed to take up every connection on my PostgreSQL server because of that problem, since while SQLAlchemy is smart enough to cleanup on exit where my FileLock cannot, the same does not happen when hot-reloading code.

So that would be one thing; the documentation should probably go into more details about what those startup and shutdown events are for (the Starlette documentation is a little more concrete about this, but no working code is given to illustrate this) and that should also be woven with the chapters about databases and such to make sure people don't miss it.

Except... That's not super ergonomic, now, is it?

from fastapi import FastAPI, Depends
from some_db_module import Connection

app = FastAPI()
_db_conn: Connection

@app.on_event("startup")
def take_lock():
    global _db_conn
    _db_conn = Connection("mydb:///")

@app.on_event("shutdown")
def release_lock():
    global _db_conn
    _db_conn.close()

def get_db_conn():
    return _db_conn

@app.get("/")
async def root(conn: Connection = Depends(get_db_conn)):
    pass

This is basically just a context manager split into two halves, linked together by a global variable. A context manager that will be entered and exited when the ASGI lifetime protocol notifies that the application has started and stopped. A context manager whose only job will be to initialize a resource to be either held while the application is running or used as an injectable dependency. Surely there's a cleaner way to do this.

Describe the solution you'd like I've been meaning to file this bug for a few weeks now, but what finally got me to do it is the release of FastAPI 0.42 (Good job, everyone!), which has context managers as dependencies as one of its main new features. Not only that, but the examples being given are pretty much all database-related, except connections are opened and closed for each call of each route instead of being polled like SQLAlchemy (and I assume encode's async database module too). Ideally, events should be replaced with something like this, but where the dependencies are pre-initialized instead of being created on the spot. Maybe by having context managers that are started and stopped based on startup and shutdown events and yield "factory functions" that could in turn be called during dependency injection to get the object that needs to be passed.

Something along those lines:

from fastapi import FastAPI, Depends
from sqlalchemy import create_engine

app = FastAPI()

@app.lifetime_dependency
def get_db_conn():
    conn_pool = create_engine("mydb:///")

    # yield a function that closes around db_conn
    # and returns it as a dependency when called
    yield lambda: conn_pool

    conn_pool.close()

@app.get("/")
async def root(conn: Connection = Depends(get_db_conn)):
    pass

Additional context Not sure where else to mention it, but I've ran into cases where the shutdown event does not get called before exiting, namely when using the VSCode debugger on Windows and stopping or restarting the application via the debugger's controls (haven't tried this on Linux yet). This apparently kills the thread without any sort of cleanup being performed and leaves all database connections open (and possibly unable to timeout, since DBAPI appears to suggest that all queries to be executed as part of a transaction, which most drivers do, and mid-transaction timeout is disabled by default on at least PostgreSQL). I don't think there is any way that could be fixed, though it should probably be mentionned somewhere, either there or in debugging part of the documentation.

UltimateLobster commented 2 months ago

Seeing the lack of movement in this issue, I decided to work on a PR for this myself. Specifically, a PR which will add support to define lifespan-scoped dependencies. However, I'm not sure what would be the best user-facing API for this. I have a few options in my mind and I would like to hear your feedback.

Option 1 - Add a is_lifespan_scoped argument to Depends

A new boolean argument would be added to Depends called is_lifespan_scoped. If it's False (the default), the dependency would behave as the current dependencies. If it's True, the dependency would be setup and teared down at the scope of the application's lifespan.

Option 2 - Add a dependency_scope argument to Depends

A new string argument would be added to Depends called dependency_scope. The argument would be one of only two supported options at the moment:

Option 3 - Add a cache_scope argument to Depends

A new string argument would be added to Depends called cache_scope. The argument would not be relevant when the use_cache argument is False. It would affect the scope in which the dependency cache is saved and can be one of 2 options at the moment:

Option 4 - A completely different annotation: LifespanDepends

Instead of using the existing Depends, the user would need a whole new annotation called LifespanDepends. It would have its own parameters (we may or may not choose to support an argument simillar to the use_cache argument). Do note, we currently have a Security annotation that inherits from Depends and therefore, if the need arise, we would need to create a lifespan scoped equivallent

Option 5 - An app scoped decorator

We would use dependencies the exact same way as today. But in order to define them, we would have to decorate them with a decorator exposed by the FastAPI object. This is pretty straight forward, but it also makes the definition of lifespan scoped dependenceis coupled to the application definition. The dependency would have to be defined in the same place as the application, or be direcly imported from a separate file and manually decorated.

Personally I lean towards options 2 and 4. They are explicit, would have a simillar behavior as today, and yet, they allow for further developments in future (like using dependencies in custom endpoints).

Would love to hear your opinion and feedback before I go ahead and implement the feature.

jbalonso commented 2 months ago

Personally I lean towards options 2 and 4. They are explicit, would have a simillar behavior as today, and yet, they allow for further developments in future (like using dependencies in custom endpoints).

Would love to hear your opinion and feedback before I go ahead and implement the feature.

I wonder if this should be turned into a Github Discussion, but I'll go ahead and share my thoughts here.

In the spirit of avoiding intensely disparate code paths, I think we should be sensitive to the existing lifespan= keyword argument and not have completely different ways of configuring dependency-powered lifespans, so I'll add an option:

Option 6 - Accept Depends objects for lifespan argument in FastAPI (app) and APIRouter constructors

We can have app = FastAPI(lifespan=lifespan_ctx_mgr), why not app = FastAPI(lifespan=Depends(lifespan_generator_func))?

Perhaps we can buttress the dependency satisfaction chain with a synthetic path operation using sentinel non-str values for the path?

This possibly falls apart due to module initialization order or other circular dependency issues, but if we carry this to APIRouter, it should be navigable.

UltimateLobster commented 2 months ago

Personally I lean towards options 2 and 4. They are explicit, would have a simillar behavior as today, and yet, they allow for further developments in future (like using dependencies in custom endpoints). Would love to hear your opinion and feedback before I go ahead and implement the feature.

I wonder if this should be turned into a Github Discussion, but I'll go ahead and share my thoughts here.

In the spirit of avoiding intensely disparate code paths, I think we should be sensitive to the existing lifespan= keyword argument and not have completely different ways of configuring dependency-powered lifespans, so I'll add an option:

Option 6 - Accept Depends objects for lifespan argument in FastAPI (app) and APIRouter constructors

We can have app = FastAPI(lifespan=lifespan_ctx_mgr), why not app = FastAPI(lifespan=Depends(lifespan_generator_func))?

Perhaps we can buttress the dependency satisfaction chain with a synthetic path operation using sentinel non-str values for the path?

This possibly falls apart due to module initialization order or other circular dependency issues, but if we carry this to APIRouter, it should be navigable.

I'm not sure I quite get you. You seem to tackle a separate (though a related) issue which is the ability for a lifespan context manager to define dependencies of its own. However, the main issue I'm trying to solve is the ability for endpoints to share their resolved dependency values in the scope of the application lifespan. (For example, sharing the same DB connection accross multiple requests of different endpoints instead of connecting and disconnecting every request).

If you want to use the same dependency value in multiple endpoints, they will still need to somehow define that in their arguments' annotation. So if I was wrong and you are trying to solve that same issue, I'm not sure how your solution would look at the endpoint side of things.

sm-Fifteen commented 1 month ago

@UltimateLobster : Option number 5 was what I had in mind for this, personally. I do get that having to import the app object wherever it is needed to define one of these could be cumbersome if one was expected to declare their lifetime deps in a dozen different modules, but the primary use-case I envision for this would be database or HTTP connection pools, which you would usually declare all in one place.

I'm not sure I would agree with letting the consumer decide whether the dep should be re-ran as if it were a regular one instead of running as a lifetime-scoped one, and I'm definitely not sold on the consumer being able to "promote" any regular dep to be lifetime-scoped. The point of lifetime-scoped deps (to me) is that they work pretty much exactly like ressources in the global namespace, but they are managed by a proper init/cleanup routine as part of the app. Having them be parametrizable or re-runable isn't what I envision them to be used for, though maybe someone who would have a use for something like that could step in and explain it to me.

jbalonso commented 1 month ago

Option 6 - Accept Depends objects for lifespan argument in FastAPI (app) and APIRouter constructors

We can have app = FastAPI(lifespan=lifespan_ctx_mgr), why not app = FastAPI(lifespan=Depends(lifespan_generator_func))? Perhaps we can buttress the dependency satisfaction chain with a synthetic path operation using sentinel non-str values for the path? This possibly falls apart due to module initialization order or other circular dependency issues, but if we carry this to APIRouter, it should be navigable.

I'm not sure I quite get you. You seem to tackle a separate (though a related) issue which is the ability for a lifespan context manager to define dependencies of its own. However, the main issue I'm trying to solve is the ability for endpoints to share their resolved dependency values in the scope of the application lifespan. (For example, sharing the same DB connection accross multiple requests of different endpoints instead of connecting and disconnecting every request).

If you want to use the same dependency value in multiple endpoints, they will still need to somehow define that in their arguments' annotation. So if I was wrong and you are trying to solve that same issue, I'm not sure how your solution would look at the endpoint side of things.

In brief: my take on FastAPI's dependency injection is that being a dependency and accepting a dependency are deeply intertwined (in terms of internal implementation), such that making a dependency a valid lifetime handler would introduce minimum new code paths (easier FastAPI development) and minimum new architectural concepts (easier FastAPI usage).

As for endpoints using dependencies with application lifespans: I note that FastAPI already prefers to cache dependency values while resolving dependencies for a given endpoint. It seems natural to me that we could pre-seed that cache with all dependencies that have been initialized with the application lifespan. To the extent this is not appropriate, use_cache is already an implemented feature of Depends().

In my case, I have expensive "warming operations" that need to be performed that are not acceptable as part of first-call latency in production, though acceptable in development when testing experimental conditions. (See the model-loading lifespan use-case for the spirit of the matter). Moreover, some of my "warming operations" require client objects (e.g. AMQP clients) and initialization activities that are appropriately part of the application lifetime rather than triggered with a call. The way my applications are configured, moreover, it sure would be nice if my application-lifetime operations could use the same dependency-powered configuration framework as the application's endpoints.

Having said all this, I found that I left a draft "Option 7" that I never posted:

Option 7 - Add lifespan_dependencies to FastAPI and APIRouter constructors

FastAPI() and APIRouter() already accept dependencies : Optional[Sequence[Depends]]] to specify dependencies to be applied to each path operation. If we added lifespan_dependencies= as an option, we could have an analogous behavior for lifespan dependencies (deliberately skipping the path-operation dependencies).

UltimateLobster commented 1 month ago

Seeing the additional feedback, I'd like to go over a few points that I consider here.

I'm not sure I would agree with letting the consumer decide whether the dep should be re-ran as if it were a regular one instead of running as a lifetime-scoped one, and I'm definitely not sold on the consumer being able to "promote" any regular dep to be lifetime-scoped. The point of lifetime-scoped deps (to me) is that they work pretty much exactly like ressources in the global namespace, but they are managed by a proper init/cleanup routine as part of the app. Having them be parametrizable or re-runable isn't what I envision them to be used for, though maybe someone who would have a use for something like that could step in and explain it to me.

While lifespan-scoped dependencies are expected to run in the scope of the application lifespan, we also need to consider they might certain needs of regular dependencies such as dependency-injection (for testing purposes) and the ability to reuse them in other lifespan/endpoint dependencies as well (imagine an endpoint dependency which queries a user record from the lifespan db connection).

Having to import your app in order to declare the dependencies, meaning your app would have to be (directly/indirectly) imported by your endpoints (or by your endpoint scoped dependencies). This would mean you'll have to structure your project in a very specific way in order to avoid circular imports, or have it so your main file would have to implicitly register these dependencies by importing them, which may be harder to use.

That is why I thought that options 2/4 may fit the most, they give a solution for the issue at hand, but do so in a way that stays familliar to existing users without needing to change a lot of code on their side. I don't look at it as "promoting an endpoint dependency to a lifetime-scope", I look at it as "endpoint dependencies separate their declration from their implementation and I would like to keep that familiar behavior".

In brief: my take on FastAPI's dependency injection is that being a dependency and accepting a dependency are deeply intertwined (in terms of internal implementation), such that making a dependency a valid lifetime handler would introduce minimum new code paths (easier FastAPI development) and minimum new architectural concepts (easier FastAPI usage).

As for endpoints using dependencies with application lifespans: I note that FastAPI already prefers to cache dependency values while resolving dependencies for a given endpoint. It seems natural to me that we could pre-seed that cache with all dependencies that have been initialized with the application lifespan. To the extent this is not appropriate, use_cache is already an implemented feature of Depends().

In my case, I have expensive "warming operations" that need to be performed that are not acceptable as part of first-call latency in production, though acceptable in development when testing experimental conditions. (See the model-loading lifespan use-case for the spirit of the matter). Moreover, some of my "warming operations" require client objects (e.g. AMQP clients) and initialization activities that are appropriately part of the application lifetime rather than triggered with a call. The way my applications are configured, moreover, it sure would be nice if my application-lifetime operations could use the same dependency-powered configuration framework as the application's endpoints.

Having said all this, I found that I left a draft "Option 7" that I never posted:

Option 7 - Add lifespan_dependencies to FastAPI and APIRouter constructors

FastAPI() and APIRouter() already accept dependencies : Optional[Sequence[Depends]]] to specify dependencies to be applied to each path operation. If we added lifespan_dependencies= as an option, we could have an analogous behavior for lifespan dependencies (deliberately skipping the path-operation dependencies).

On that same note, if I understand your solution correctly, I would still need to use the dependencies in the exact same way as today, but it would be treated as a lifespan scoped dependency because it happened to be passed to the new lifespan_dependencies of the router/app. If my dependency doesn't need to be used by any endpoint, this is a fine solution in my opinion. The issue I have is that for endpoints, this solution is less explicit and may cause confusion or unexpected behaviors because of a possible conflicts between dependency definitions.

Consider the following:

router = APIRouter(lifespan_dependencies=[Depends(my_dependency, use_cache=False)]

@router.get("/")
async def endpoint(dependency: Annotated[str, Depends(my_dependency)]
    pass

Both clearly refer to the same dependency, but now they have conflicting arguments (the use_cache argument differs). Usually this wouldn't be a problem since the declartion of the dependency (and its intended behavior) is defined in one place, therefore we only need to consider the endpoint declaration and nothing else in order to determine the expected behavior. But with your solution, it happens in two separate places and it's not obvious what should be the expected behavior.

UltimateLobster commented 1 week ago

Posting an update here: I decided to move forward with option 2 (add a dependency_scope argument to Depends) and created a draft PR #12529 which should cover the needs of everyone who posted here (Let me know if there a need this doesn't cover).

I still need to add documentation and more tests, but the feature itself is already fully implemented there I would love to hear your feedback.

Here's an example of the feature:

def mongo_client():
    with MongoClient(SOME_CONNECTION_STRING) as client:
        yield client

ActiveMongoClient = mongo_client: Annotated[MongoClient, Depends(mongo_client, dependency_scope="lifespan")]

# The `mongo_client` dependency will be evaluated once when the application starts. The returned instance will be reused for all requests to both endpoints. When the application shuts down, that single instance will be cleaned up as well.

@app.post("/users")
def insert_user(
    mongo_client: ActiveMongoClient,
    username: str,  
) -> str:
    inserted_id = mongo_client["SomeDb"]["Users"].insert_one({"username": username})
    return user_id

@app.get("/users/{user_id}/username")
def get_username(
    mongo_client: ActiveMongoClient,
    user_id: Annotated[str, Path()],
) -> str:
    user = mongo_client["SomeDb"]["Users"].find_one({"_id": user_id})
    return user["username"]

The use_cache argument works in the same way for lifespan-scoped dependencies as it is for endpoint-scoped dependencies:

def mongo_client():
    with MongoClient(SOME_CONNECTION_STRING) as client:
        yield client

DedicatedMongoConnection = mongo_client: Annotated[MongoClient, Depends(
    mongo_client, 
    dependency_scope="lifespan",
    use_cache=False
)]

# Each of these endpoints would have their own, dedicated mongo client which be reused for all requests to its respective endpoint. So overall, 2 instances will be created when the application starts. Those same 2 will be cleaned when the application shuts down.

@app.post("/users")
def insert_user(
    mongo_client: DedicatedMongoConnection ,
    username: str,  
) -> str:
    inserted_id = mongo_client["SomeDb"]["Users"].insert_one({"username": username})
    return user_id

@app.get("/users/{user_id}/username")
def get_username(
    mongo_client: DedicatedMongoConnection ,
    user_id: Annotated[str, Path()],
) -> str:
    user = mongo_client["SomeDb"]["Users"].find_one({"_id": user_id})
    return user["username"]