apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.27k stars 14.34k forks source link

AIP-84 Fix session handling #44187

Closed pierrejeambrun closed 2 days ago

pierrejeambrun commented 3 days ago

I scratched my head on this one.

Problem

Tests are green, CI is green, manual API testing looks good, but when we use the UI, we see random 500 errors from time to time, cf screenshot.

All that hint to a bug in the session management. Between the DB dependency from fastapi, the way fastapi uses multiple thread to serve sync route requests, how we create session in airflow utility code and when / how are sessions rolled back / commited.

TLDR:

The default Session factory uses sessionmaker that will make a session thread local, on the other hand FastAPI can re-use the same thread to serve multiple requests but because of the dependency we use, the session is closed after each request making successive requests on the same thread to fail.

Session lifecycle that we want is not per thread but per HTTTP request. To achieve that we delegate the session handling to FastAPI dependency system and use a normal session factory not a threadlocal one.

Screenshot 2024-11-19 at 16 06 47

ashb commented 3 days ago

Doesn't this mean that each request is going to open a new connection, and there is no re-use or pooling between requests?

pierrejeambrun commented 3 days ago

Doesn't this mean that each request is going to open a new connection, and there is no re-use or pooling between requests?

Each request will open a new Session, I am not super familiar with our code regarding pooling but I would say that pooling setting is done at the engine level, so that would still be handled by the underlying engine ?

The only difference with what we had before is the Session factory which is now not thread local in some cases but 'request' local.

ashb commented 3 days ago

It's not our code doing the pooling, we're using the feature built in to SQLA to handle this

pierrejeambrun commented 3 days ago

If we have more concurrent request in FastAPI than connections allowed in the Pool, we might run into a problem indeed.

But we can argue that this would already be the case with thread local session. FastAPI handles sync requests in different threads so multiple concurrent requests end up using multiple threads and therefore as many different sessions checkout a connection from the pool.

I don't think that removing a thread local registry for session directly impact how connection pooling is done in this case.

But I might be missing something.

ashb commented 3 days ago

https://docs.sqlalchemy.org/en/14/orm/contextual.html#using-thread-local-scope-with-web-applications says how we should be doing this. Creating a session object per request is not it.

pierrejeambrun commented 3 days ago

https://docs.sqlalchemy.org/en/14/orm/contextual.html#using-thread-local-scope-with-web-applications says how we should be doing this. Creating a session object per request is not it.

I don't think that works, as mentioned in the documentation there are exceptions for asynchronous webservers:

with notable exceptions such as the asynchronous frameworks Twisted and Tornado,

FastAPI I believe falls in that category. There is no direct correspondence between the request and 1 single thread processing it. (Dependencies can be resolved in different thread, it's even less true when we migrate to full async support). This is why we cannot use scoped_session for FastAPI.

All that is not super intuitive, this is why "I scratched my head on this one."

pierrejeambrun commented 3 days ago

https://github.com/fastapi/fastapi/issues/726#issuecomment-584371305 I think this comment is relevant.