Closed Smidelis closed 1 year ago
Freqtrade doesn't handle it's sessions itself - but through SQLAlchemy's scoped_session()
- which automatically handles sessions in a threat-aware manner.
Effectively, this means that we'll neither
but we'll rely on SQLAlchemy's logic and functionality to handle this for us. We do however make sure (especially when in the API server context) to roll back transactions api requests may have opened to avoid leaks like this.
If this is a problem / leaky behavior - it's most likely a bug in SQLAlchemy - at least i'd not be aware how / where we could be change something to improve behavior in this (other than manually handling sessions, which i'm not even going to start to look at, as it'd complicate the code beyond reason, and would probably introduce a lot more bugs in the process).
Looking at the logs / version - i'd also suggest to look at this with develop (and updated dependencies). SQLAlchemy was recently bumped from 1.4 to 2.0.x - which might improve behavior in this.
From the logs i also noticed that you're using freqAI - while not in itself a problem, you should make sure to limit it to N-1 parallelism (where N = your core count) - so 1 core is always available to process "random" requests.
Otherwise it's surely a possibility that web-requests remain "hanging" (you should see errors / timeouts in the UI) as freqtrade has no more resources available to actually handle the incoming requests.
if Postgres is on the same server (which it seems to be in your case) - the actually better value might be N-2 - to leave one core to postgres entirely. I'm not sure how postgres handles heavy resource contention ... whenever i ran Postgres in the past, it was the only heavy consumer on that system. If it can't get CPU - the 2 (freqtrade / postgres) might contend for the same resource (while freqAI is training happily on all cores) - causing sessions to fail closing (or sticking around longer than they should).
To be sure to exclude heavy load as problem, the ideal way would be to run a test without freqAI on that same system. Otherwise resource contention can't be excluded.
Thanks for your quick response.
Looking at the logs / version - i'd also suggest to look at this with develop (and updated dependencies). SQLAlchemy was recently bumped from 1.4 to 2.0.x - which might improve behavior in this.
I've updated to develop, running SQLAlchemy==2.0.5.post1.
From the logs i also noticed that you're using freqAI - while not in itself a problem, you should make sure to limit it to N-1 parallelism (where N = your core count) - so 1 core is always available to process "random" requests.
Im pretty sure it's not a resource problem. Looking at the host, it's only using 50% of the assigned CPU resources. I've limited the FreqAI core count further more. Right now Freqtrade is only using 30% available CPU resources.
Apart from that it's only happening when the QueuePool limit is reached. I can clearly track it down to that one, as whatever sizes im using it's crashing when the thresholds are reached. Hopefully the updated SQLAlchemy version is fixing it.
Will keep you posted.
Im pretty sure it's not a resource problem. Looking at the host, it's only using 50% of the assigned CPU resources.
then limiting freqAI further won't make sense. it's a thought where i can imagine this - as i know fastapi spins up new threads for new requests if old requests didn't finish yet (up to a limit - which i believe is 40). obviously that would also mean that there'd potentially be >40 sessions (40 for fastapi, but that doesn't exclude the bot actually running). with regular workloads (1 freqUI instance) i can't get it to more than 7 fastapi threads (but my system is not maxed out otherwise). once load dies, these threads stick around for about a minute or so - but they're stopped after that (no thread -> no session attached to it).
If you (on top of that) also use trade info in the freqAI strategy / model - then that'd be another X sessions for that, depending on how many threads you're using there.
I've just given this a go for the last hour or so.
freqUI always open, with auto-refresh on. I did "hammer" the API (constantly press refresh in the UI) - i was able to temporarily get sessions up to 13 (13 sessions attached to freqtrade, anyway). But even with hammering, i'm unable to get to more sessions.
i've then left it at that for an hour - coming back to 6 open sessions. Even trying to hammer it now, i seem to be unable to get it to more than 7-8 sessions.
This suggests 2 things to me:
Hey, sorry for my late response.
Unfortunately i couldn't find a way to prevent the QueuePool size reached crashes without changing FT source. I always had sessions waiting for ClientRead
in the state idle in transaction
. They were adding up and not closing until the QueuePool size was reached. They were adding up when i was using FreqUI and with more open trades. You mentioned, that sessions are also increasing when information from the trade object is being queried. I am using it in my custom_exit
and in adjust_trade_position
multiple times. Im using 20 max_open_trades
. So i also recognized that not only with using FreqUI they were adding up but also more quickly the more trades were open and handling adjust_trade_position/custom_exit
.
I've then done a little research about the state idle in transaction
, waiting for ClientRead
. According to this post, if sessions are in that state for a longer time it's most probably due to the application not closing the session properly. I've then checked how to close connections and stumbled across the SQLAlchemy autocommit isolation level . I've now changed my engine connection properties from https://github.com/freqtrade/freqtrade/blob/8f29312c9e6f6637628965c28dcb9409f562c29e/freqtrade/persistence/models.py#L49
to
engine = create_engine(db_url, isolation_level="AUTOCOMMIT", future=True, **kwargs)
(I also still have to change the QueuePool size because of my many max_open_trades
and the trade object being queried multiple times within my strategy.)
After using the autocommit isolation level i dont have the "dangling" sessions with the state idle in transaction
anymore.
Right now with the high volatile market all my 20 slots were "burning" and my sessions still added up but stopped at around 40 sessions (without the setting i even had around 60 sessions. Not tried with an even higher threshold, but i guess it would have been reached too).
When querying the pg_stat_activity
view, I can see that most of the sessions are getting reused now. This is current FT idle without open trades. Just started a new dry run from scratch.
After some time and with some open trades there are still some sessions being open for multiple hours querying some old trade information (unfortunately i dont have detailed information about the query right now as i just started a fresh dry run, but i can provide those queries in case you are interested).
So for me setting isolation_level="AUTOCOMMIT"
solved it but unfortunately it's just a workaround and not solving the root cause.
well that's the fun part - we do close sessions (calling rollback doing this).
I noticed the idle in transaction
sessions too - but only 2-3 at maximum - and they were being reused as new api calls came in.
even doing 100ds of calls did not increase these sessions - i got 6-7 at maximum (using a "hammer api" script).
we don't use the Trade object in the api file - therefore all db requests MUST go through this dependency.
While autocommit will for sure fix it - i'd strongly recommend against using that - as it can have other, unintended side-effects, even errors (we assume to use explicit commits) - and if i remember correctly - did in the past (which is why we moved away from this years ago).
I am using it in my
custom_exit
and inadjust_trade_position
multiple times.
i don't think these are called by regular freqai training runs - but if you'd be using it in one of the freqAI specific populate*
methods, then it'd for sure be potentially problematic (these run in different sessions, eventually).
As such - your sessions would then be "api + freqai parallel threads + some regular sessions".
How is your postgres configured? for my tests, i'm using the docker-container postgres:14-alpine - without any configuration other than the password.
I'm happy to look at this further if you can provide a minimal reproduction (clearly no freqAI, must reproduce within max. 1 hour, ideally with a moslty empty strategy). now obviously, it won't need to open 40 sessions - but it must clearly be increasing sessions without cleaning them again - which I've not been able to reproduce beyond 10 sessions - which is however normal / expected behavior.
If we ain't able to get such a reproduction, it's most likely tied to something in either your configuration - or in your strategy.
I also noticed that some are being reused but i also noticed some which do have very old queries of past trades. For now autocommit works fine for me. Not having issues after around 1000 trades and 2600 orders.
From my understanding FreqAI does play no role in this issue (except a hardware resource issue, which is not the case).
I was talking about the Trade object queries in my custom_exit
and adjust_trade_position
, which, if i got it right, might query the DB.
Currently im using
trade.nr_of_successful_entries (custom_exit and adjust_trade_position)
trade.entry_side (custom_exit and adjust_trade_position)
trade.select_filled_orders (adjust_trade_position)
I noticed, the more open trades i have, the more DB sessions. In case there are no open trades i can "hammer" FreqUI and im getting no increase, so maybe it's the Trade object queries?
This is my postgresql docker-compose.yml
version: '3.5'
services:
postgres:
container_name: postgres
image: postgres:latest
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PW}
ports:
- "5432:5432"
volumes:
- ./data_postgres:/var/lib/postgresql/data
restart: always
networks:
- backend
pgadmin:
container_name: pgadmin
image: dpage/pgadmin4:latest
environment:
- PGADMIN_DEFAULT_EMAIL=${PGADMIN_MAIL}
- PGADMIN_DEFAULT_PASSWORD=${PGADMIN_PW}
ports:
- "5050:80"
volumes:
- ./data_pgadmin/pgadmin:/var/lib/pgadmin
restart: always
networks:
- backend
networks:
backend:
name: freqtrade_backend
external: true
I will try to reproduce it with a bare minimum strategy including DCA on a different host on the weekend.
From my understanding FreqAI does play no role in this issue (except a hardware resource issue, which is not the case).
This is also my assumption - but a "quick" reproduction without freqAI will be useful, as that excludes one more component (as well as makes it possible to run a test in the background with a debugger attached, while still continuing to work on other things).
well you're doing similar things than me - basically using an unmodified postgres container 👍
Currently i'm using
trade.nr_of_successful_entries (custom_exit and adjust_trade_position) trade.entry_side (custom_exit and adjust_trade_position) trade.select_filled_orders (adjust_trade_position)
these are using attributes of the (already selected) trade object - but ain't doing active queries against the database (that'd be Trade.get_trades()
or Trade.get_trades_proxy()
and a few, non-documented other functions).
I therefore struggle to see how they are actually relevant.
Please give #8344 a try and let me know. i couldn't get to the "idle in transaction" state anymore with that change.
@gaugau3000 this might also be of interest to you (if i understood your comment on discord correctly) - so i'd appreciate if you could give the branch a try.
Thanks a lot! I've pulled the branch, updated dependencies and restarted my dry run. The dry run im using right now is trading way more seldom. Don't have had the issue anymore with 3 - 6 open trades. So it might only be reproducible if there are tons of trades (for me happening with 15 - 20 open trades). As i were monitoring the DB sessions also with the current strat (less trades) i will still be able to compare it.
I will monitor it for some time and provide feedback as soon as there are further insights.
Thanks a lot for your support!
Hey xmatthias, Thanks for your support. I can already right now say that your changes fixed it. It stays at 5 open sessions (no fluctuations, just 5 in total) which are properly getting reused. Great job! I think the issue can be close, except you want me to test some further specific scenarios.
@xmatthias many tks i will take the latest developp to test !
Describe your environment
Note: All issues other than enhancement requests will be closed without further comment if the above template is deleted or not filled out.
Describe the problem:
Im using Freqtrade with PostgreSQL as database (
--db-url postgresql+psycopg2://db_user:*****@localhost:5432/Hasensachen_FreqAI_07
). The database is running via docker. Freqtrade is running as systemd service. Both are running on the same host. Everything is running fine except, that after some time freqtrade crashes with the following error:raise exc.TimeoutError(sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 40 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
When i started using postgresql i was using the default QueuePool of size 5 overflow 10. I have increased it to size 20 overflow 40 by changing the following line https://github.com/freqtrade/freqtrade/blob/a31045874ed51faf22f7672ec4b8823df59fe83a/freqtrade/persistence/models.py#L48 toengine = create_engine(db_url, pool_size=20, max_overflow=40, future=True, **kwargs)
Unfortunately this does not prevent Freqtrade from crashing but postpones the crash. With the default QueuePool settings it was running fine for a few hours, with the increased sizes it's running around 1 - 2 days before it crashes.
Im running pgAdmin to manage my DB. From there I can clearly see that the database sessions are increasing over time (unfortunately there is no history available). Right now im just near my limit again (60 sessions). The following picture shows what happens after a restart of freqtrade. After the restart the sessions are getting reduced to below 10 and then adding up to the limit again. Below just as an example a few of the open sessions: (Please excuse the screenshots)
After the restart they are gone. What i experienced is, that they are increasing in real-time when i click around in FreqUI.
Steps to reproduce:
Relevant code exceptions or logs
In the attached file
ft_db_error.log
you can see a full run until FT is crashing. ft_db_error.logRelated info
I've found #7364, which shows the same error but it were concluded to be a connection issue. Im pretty sure that there is no connection issue at my side as i can clearly see from the db sessions in pgAdmin that they are increasing over time and the bot crashes, as soon as they are reaching the QueuePool size limit. Also as they are running within the same host. Hardware is also no issue as at the time of the crash as there are plenty of free resources available to the host at the time of the crash.
Thanks a lot for your support!