freqtrade / freqtrade

Free, open source crypto trading bot
https://www.freqtrade.io
GNU General Public License v3.0
28.63k stars 6.14k forks source link

PostgreSQL DB Sessions are not closed properly? #8285

Closed Smidelis closed 1 year ago

Smidelis commented 1 year ago

Describe your environment

Note: All issues other than enhancement requests will be closed without further comment if the above template is deleted or not filled out.

Describe the problem:

Im using Freqtrade with PostgreSQL as database (--db-url postgresql+psycopg2://db_user:*****@localhost:5432/Hasensachen_FreqAI_07). The database is running via docker. Freqtrade is running as systemd service. Both are running on the same host. Everything is running fine except, that after some time freqtrade crashes with the following error: raise exc.TimeoutError(sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 40 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r) When i started using postgresql i was using the default QueuePool of size 5 overflow 10. I have increased it to size 20 overflow 40 by changing the following line https://github.com/freqtrade/freqtrade/blob/a31045874ed51faf22f7672ec4b8823df59fe83a/freqtrade/persistence/models.py#L48 to engine = create_engine(db_url, pool_size=20, max_overflow=40, future=True, **kwargs)

Unfortunately this does not prevent Freqtrade from crashing but postpones the crash. With the default QueuePool settings it was running fine for a few hours, with the increased sizes it's running around 1 - 2 days before it crashes.

Im running pgAdmin to manage my DB. From there I can clearly see that the database sessions are increasing over time (unfortunately there is no history available). Right now im just near my limit again (60 sessions). The following picture shows what happens after a restart of freqtrade. image After the restart the sessions are getting reduced to below 10 and then adding up to the limit again. Below just as an example a few of the open sessions: image image (Please excuse the screenshots)

After the restart they are gone. What i experienced is, that they are increasing in real-time when i click around in FreqUI.

Steps to reproduce:

  1. Use Freqtrade with postgresql
  2. Leave Freqtrade running for a while
  3. Click around in FreqUI to increase the sessions manually.

Relevant code exceptions or logs

2023-03-06 15:27:19,459 - uvicorn.error - ERROR - Exception in ASGI application
Traceback (most recent call last):
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 86, in __call__
    raise exc from None
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 82, in __call__
    await self.app(scope, inner_receive, inner_send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/fastapi/applications.py", line 271, in __call__
    await super().__call__(scope, receive, send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/applications.py", line 118, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/myuser/freqtrade/freqtrade/rpc/api_server/api_v1.py", line 201, in locks
    return rpc._rpc_locks()
  File "/home/myuser/freqtrade/freqtrade/rpc/rpc.py", line 926, in _rpc_locks
    locks = PairLocks.get_pair_locks(None)
  File "/home/myuser/freqtrade/freqtrade/persistence/pairlock_middleware.py", line 73, in get_pair_locks
    return PairLock.query_pair_locks(pair, now, side).all()
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2773, in all
    return self._iter().all()
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter
    result = self.session.execute(
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1713, in execute
    conn = self._connection_for_bind(bind)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
    return self._transaction._connection_for_bind(
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 747, in _connection_for_bind
    conn = bind.connect()
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/future/engine.py", line 406, in connect
    return super(Engine, self).connect()
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3315, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3394, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3361, in _wrap_pool_connect
    return fn()
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 327, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 493, in checkout
    rec = pool._do_get()
  File "/home/myuser/freqtrade/.env/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
    raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 40 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)

In the attached file ft_db_error.log you can see a full run until FT is crashing. ft_db_error.log

Related info

I've found #7364, which shows the same error but it were concluded to be a connection issue. Im pretty sure that there is no connection issue at my side as i can clearly see from the db sessions in pgAdmin that they are increasing over time and the bot crashes, as soon as they are reaching the QueuePool size limit. Also as they are running within the same host. Hardware is also no issue as at the time of the crash as there are plenty of free resources available to the host at the time of the crash.

Thanks a lot for your support!

xmatthias commented 1 year ago

Freqtrade doesn't handle it's sessions itself - but through SQLAlchemy's scoped_session() - which automatically handles sessions in a threat-aware manner.

Effectively, this means that we'll neither

but we'll rely on SQLAlchemy's logic and functionality to handle this for us. We do however make sure (especially when in the API server context) to roll back transactions api requests may have opened to avoid leaks like this.

If this is a problem / leaky behavior - it's most likely a bug in SQLAlchemy - at least i'd not be aware how / where we could be change something to improve behavior in this (other than manually handling sessions, which i'm not even going to start to look at, as it'd complicate the code beyond reason, and would probably introduce a lot more bugs in the process).


Looking at the logs / version - i'd also suggest to look at this with develop (and updated dependencies). SQLAlchemy was recently bumped from 1.4 to 2.0.x - which might improve behavior in this.


From the logs i also noticed that you're using freqAI - while not in itself a problem, you should make sure to limit it to N-1 parallelism (where N = your core count) - so 1 core is always available to process "random" requests.

Otherwise it's surely a possibility that web-requests remain "hanging" (you should see errors / timeouts in the UI) as freqtrade has no more resources available to actually handle the incoming requests.

if Postgres is on the same server (which it seems to be in your case) - the actually better value might be N-2 - to leave one core to postgres entirely. I'm not sure how postgres handles heavy resource contention ... whenever i ran Postgres in the past, it was the only heavy consumer on that system. If it can't get CPU - the 2 (freqtrade / postgres) might contend for the same resource (while freqAI is training happily on all cores) - causing sessions to fail closing (or sticking around longer than they should).


To be sure to exclude heavy load as problem, the ideal way would be to run a test without freqAI on that same system. Otherwise resource contention can't be excluded.

Smidelis commented 1 year ago

Thanks for your quick response.

Looking at the logs / version - i'd also suggest to look at this with develop (and updated dependencies). SQLAlchemy was recently bumped from 1.4 to 2.0.x - which might improve behavior in this.

I've updated to develop, running SQLAlchemy==2.0.5.post1.

From the logs i also noticed that you're using freqAI - while not in itself a problem, you should make sure to limit it to N-1 parallelism (where N = your core count) - so 1 core is always available to process "random" requests.

Im pretty sure it's not a resource problem. Looking at the host, it's only using 50% of the assigned CPU resources. I've limited the FreqAI core count further more. Right now Freqtrade is only using 30% available CPU resources.

Apart from that it's only happening when the QueuePool limit is reached. I can clearly track it down to that one, as whatever sizes im using it's crashing when the thresholds are reached. Hopefully the updated SQLAlchemy version is fixing it.

Will keep you posted.

xmatthias commented 1 year ago

Im pretty sure it's not a resource problem. Looking at the host, it's only using 50% of the assigned CPU resources.

then limiting freqAI further won't make sense. it's a thought where i can imagine this - as i know fastapi spins up new threads for new requests if old requests didn't finish yet (up to a limit - which i believe is 40). obviously that would also mean that there'd potentially be >40 sessions (40 for fastapi, but that doesn't exclude the bot actually running). with regular workloads (1 freqUI instance) i can't get it to more than 7 fastapi threads (but my system is not maxed out otherwise). once load dies, these threads stick around for about a minute or so - but they're stopped after that (no thread -> no session attached to it).

If you (on top of that) also use trade info in the freqAI strategy / model - then that'd be another X sessions for that, depending on how many threads you're using there.

xmatthias commented 1 year ago

I've just given this a go for the last hour or so.

freqUI always open, with auto-refresh on. I did "hammer" the API (constantly press refresh in the UI) - i was able to temporarily get sessions up to 13 (13 sessions attached to freqtrade, anyway). But even with hammering, i'm unable to get to more sessions.

i've then left it at that for an hour - coming back to 6 open sessions. Even trying to hammer it now, i seem to be unable to get it to more than 7-8 sessions.

This suggests 2 things to me:

Smidelis commented 1 year ago

Hey, sorry for my late response. Unfortunately i couldn't find a way to prevent the QueuePool size reached crashes without changing FT source. I always had sessions waiting for ClientRead in the state idle in transaction. They were adding up and not closing until the QueuePool size was reached. They were adding up when i was using FreqUI and with more open trades. You mentioned, that sessions are also increasing when information from the trade object is being queried. I am using it in my custom_exit and in adjust_trade_position multiple times. Im using 20 max_open_trades. So i also recognized that not only with using FreqUI they were adding up but also more quickly the more trades were open and handling adjust_trade_position/custom_exit.

I've then done a little research about the state idle in transaction, waiting for ClientRead. According to this post, if sessions are in that state for a longer time it's most probably due to the application not closing the session properly. I've then checked how to close connections and stumbled across the SQLAlchemy autocommit isolation level . I've now changed my engine connection properties from https://github.com/freqtrade/freqtrade/blob/8f29312c9e6f6637628965c28dcb9409f562c29e/freqtrade/persistence/models.py#L49 to

engine = create_engine(db_url, isolation_level="AUTOCOMMIT", future=True, **kwargs)

(I also still have to change the QueuePool size because of my many max_open_trades and the trade object being queried multiple times within my strategy.)

After using the autocommit isolation level i dont have the "dangling" sessions with the state idle in transaction anymore. Right now with the high volatile market all my 20 slots were "burning" and my sessions still added up but stopped at around 40 sessions (without the setting i even had around 60 sessions. Not tried with an even higher threshold, but i guess it would have been reached too).

When querying the pg_stat_activity view, I can see that most of the sessions are getting reused now. This is current FT idle without open trades. Just started a new dry run from scratch. image After some time and with some open trades there are still some sessions being open for multiple hours querying some old trade information (unfortunately i dont have detailed information about the query right now as i just started a fresh dry run, but i can provide those queries in case you are interested).

So for me setting isolation_level="AUTOCOMMIT" solved it but unfortunately it's just a workaround and not solving the root cause.

xmatthias commented 1 year ago

well that's the fun part - we do close sessions (calling rollback doing this).

https://github.com/freqtrade/freqtrade/blob/8f29312c9e6f6637628965c28dcb9409f562c29e/freqtrade/rpc/api_server/deps.py#L18-L25

I noticed the idle in transaction sessions too - but only 2-3 at maximum - and they were being reused as new api calls came in. even doing 100ds of calls did not increase these sessions - i got 6-7 at maximum (using a "hammer api" script). we don't use the Trade object in the api file - therefore all db requests MUST go through this dependency.

While autocommit will for sure fix it - i'd strongly recommend against using that - as it can have other, unintended side-effects, even errors (we assume to use explicit commits) - and if i remember correctly - did in the past (which is why we moved away from this years ago).

I am using it in my custom_exit and in adjust_trade_position multiple times.

i don't think these are called by regular freqai training runs - but if you'd be using it in one of the freqAI specific populate* methods, then it'd for sure be potentially problematic (these run in different sessions, eventually). As such - your sessions would then be "api + freqai parallel threads + some regular sessions".


How is your postgres configured? for my tests, i'm using the docker-container postgres:14-alpine - without any configuration other than the password.


I'm happy to look at this further if you can provide a minimal reproduction (clearly no freqAI, must reproduce within max. 1 hour, ideally with a moslty empty strategy). now obviously, it won't need to open 40 sessions - but it must clearly be increasing sessions without cleaning them again - which I've not been able to reproduce beyond 10 sessions - which is however normal / expected behavior.

If we ain't able to get such a reproduction, it's most likely tied to something in either your configuration - or in your strategy.

Smidelis commented 1 year ago

I also noticed that some are being reused but i also noticed some which do have very old queries of past trades. For now autocommit works fine for me. Not having issues after around 1000 trades and 2600 orders.

From my understanding FreqAI does play no role in this issue (except a hardware resource issue, which is not the case). I was talking about the Trade object queries in my custom_exit and adjust_trade_position, which, if i got it right, might query the DB. Currently im using

trade.nr_of_successful_entries (custom_exit and adjust_trade_position)
trade.entry_side (custom_exit and adjust_trade_position)
trade.select_filled_orders (adjust_trade_position)

I noticed, the more open trades i have, the more DB sessions. In case there are no open trades i can "hammer" FreqUI and im getting no increase, so maybe it's the Trade object queries?

This is my postgresql docker-compose.yml

version: '3.5'
services:
  postgres:
    container_name: postgres
    image: postgres:latest
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PW}
    ports:
      - "5432:5432"
    volumes:
      - ./data_postgres:/var/lib/postgresql/data
    restart: always
    networks:
      - backend
  pgadmin:
    container_name: pgadmin
    image: dpage/pgadmin4:latest
    environment:
      - PGADMIN_DEFAULT_EMAIL=${PGADMIN_MAIL}
      - PGADMIN_DEFAULT_PASSWORD=${PGADMIN_PW}
    ports:
      - "5050:80"
    volumes:
      - ./data_pgadmin/pgadmin:/var/lib/pgadmin
    restart: always
    networks:
      - backend
networks:
  backend:
    name: freqtrade_backend
    external: true

I will try to reproduce it with a bare minimum strategy including DCA on a different host on the weekend.

xmatthias commented 1 year ago

From my understanding FreqAI does play no role in this issue (except a hardware resource issue, which is not the case).

This is also my assumption - but a "quick" reproduction without freqAI will be useful, as that excludes one more component (as well as makes it possible to run a test in the background with a debugger attached, while still continuing to work on other things).

well you're doing similar things than me - basically using an unmodified postgres container 👍

Currently i'm using

trade.nr_of_successful_entries (custom_exit and adjust_trade_position)
trade.entry_side (custom_exit and adjust_trade_position)
trade.select_filled_orders (adjust_trade_position)

these are using attributes of the (already selected) trade object - but ain't doing active queries against the database (that'd be Trade.get_trades() or Trade.get_trades_proxy() and a few, non-documented other functions). I therefore struggle to see how they are actually relevant.

xmatthias commented 1 year ago

Please give #8344 a try and let me know. i couldn't get to the "idle in transaction" state anymore with that change.

@gaugau3000 this might also be of interest to you (if i understood your comment on discord correctly) - so i'd appreciate if you could give the branch a try.

Smidelis commented 1 year ago

Thanks a lot! I've pulled the branch, updated dependencies and restarted my dry run. The dry run im using right now is trading way more seldom. Don't have had the issue anymore with 3 - 6 open trades. So it might only be reproducible if there are tons of trades (for me happening with 15 - 20 open trades). As i were monitoring the DB sessions also with the current strat (less trades) i will still be able to compare it.

I will monitor it for some time and provide feedback as soon as there are further insights.

Thanks a lot for your support!

Smidelis commented 1 year ago

Hey xmatthias, Thanks for your support. I can already right now say that your changes fixed it. It stays at 5 open sessions (no fluctuations, just 5 in total) which are properly getting reused. Great job! I think the issue can be close, except you want me to test some further specific scenarios.

gaugau3000 commented 1 year ago

@xmatthias many tks i will take the latest developp to test !