Open spapas opened 4 months ago
Hello friends, this proble is still buggin us. We use the latest redash version and try to execute a query as simple as this:
select * from passenger
Please notice that the passenger table is really big (like 200M rows). The problem is when we run the query the redash rqworker starts consuming huge amounts of memory (like 10GB, all the free memory of the server) and makes the whole server unresponsive.
Can you please give a little love to this because we can't really use redash if we have this problem.
There must definitely be a memory leak somewhere. Can you try running a query returning a lot of resutls on a postgresql database to check the behaviour ?
select * from passenger
That's a good example of a SQL query to never run, as that will attempt to return the entire contents of the table (all fields, all data) back to Redash, which will attempt to fit that in memory.
Instead you should be running queries that extract the summary data you want. Something like this instead:
select count(*) from passenger
Which will return the number of rows in the passenger
table.
First of all I am aware of the fact that this query returns a lot of data and shouldn't be run (usually), however please notice:
For now I have implemented a hacky solution of allowing the queries to run for up to 30 seconds before killing them so the leak can't grow that big and make the server unresponsive. However this means that we can't run most queries in redash and need to use dbeaver or psql directly.
As I said before this definitely is a bug that has been introduced in the previous year or something, I had never had a similar problem with redash and I am using it for like 7 years (the oldest query in my self-hosted server is from May 2017)!
For example return up to 1000 items ...
Oh. I thought that had already been implemented for most data sources ages ago. It sounds like you're either not using one of the one's it's implemented for, or there may indeed be a bug. :frowning:
are we talking about https://github.com/getredash/redash/issues/78? it's been 10 years and still open :grin:
No i don't think that this is related to the #78. This is a new issue, I don't remember having is before.
well i remember differently :man_shrugging:
as https://github.com/getredash/redash/issues/78 indicates, redash never has the capability of dealing with select * from big_table
of all the redash versions i have used (from v0.7.0
to 24.07.0-dev
), if you do select * from big_table
where table has millions of rows, redash will ALWAYS experience out-of-memory.
this includes MySQL, postgresql/redshift, presto, etc.
if you're using aws, you can always set a maximum-returned-rows on the db side. we set 200k as the limit, whenever the query results exceed 200k, the db cancels the query
No we use self hosted postgres. For now we cancel the query after 30 s but this isn't a good solution for our needs.
then i suggest "monitoring + education + training". a couple of years ago we had the same problem: inexperienced people doing select * from big_table
bringing down the whole service. we have been doing the following:
monitoring who's doing what on redash
educating users on what they should and shouldn't do on redash
training programs with sufficient documentations
setting maximum-returned-rows to 200k where possible (of course not all db services let you set a limit)
little to no out-of-memory issue afterwards
Thank you @zachliu I'll try it this way!
Hello, I'm using the latest redash version (from github): commit 76b5a30fd97330445b0d28553bf91e50fcc10e7f. This has been installed on a bare metal alma 9 server. After updating to this version I experience very large memory consumption from the rq worker (I start it with
python manage.py rq worker
). It takes all the memory of the server (like 15 GB and even the swap file) making the server unresponsive while the query runs. The query finishes after 10 minutes or so and returs the resuts so I don't get any actual error.However the same query takes 1-2 seconds when I run it on dbeaver and returns ~ 2000 rows.
Please notice that I experience this behavior only on that query, for other (smaller) queries it works without problems.
The query is run on postgres 15.
Thank you for any help