Very high memory consumption after updating to latest redash version

spapas commented 4 months ago

Hello, I'm using the latest redash version (from github): commit 76b5a30fd97330445b0d28553bf91e50fcc10e7f. This has been installed on a bare metal alma 9 server. After updating to this version I experience very large memory consumption from the rq worker (I start it with python manage.py rq worker). It takes all the memory of the server (like 15 GB and even the swap file) making the server unresponsive while the query runs. The query finishes after 10 minutes or so and returs the resuts so I don't get any actual error.

However the same query takes 1-2 seconds when I run it on dbeaver and returns ~ 2000 rows.

Please notice that I experience this behavior only on that query, for other (smaller) queries it works without problems.

The query is run on postgres 15.

Thank you for any help

spapas commented 4 months ago

Hello friends, this proble is still buggin us. We use the latest redash version and try to execute a query as simple as this:

select * from passenger

Please notice that the passenger table is really big (like 200M rows). The problem is when we run the query the redash rqworker starts consuming huge amounts of memory (like 10GB, all the free memory of the server) and makes the whole server unresponsive.

Can you please give a little love to this because we can't really use redash if we have this problem.

spapas commented 4 months ago

There must definitely be a memory leak somewhere. Can you try running a query returning a lot of resutls on a postgresql database to check the behaviour ?

justinclift commented 4 months ago

select * from passenger

That's a good example of a SQL query to never run, as that will attempt to return the entire contents of the table (all fields, all data) back to Redash, which will attempt to fit that in memory.

Instead you should be running queries that extract the summary data you want. Something like this instead:

select count(*) from passenger

Which will return the number of rows in the passenger table.

spapas commented 4 months ago

First of all I am aware of the fact that this query returns a lot of data and shouldn't be run (usually), however please notice:

Our redash is used by inexperienced people that may actually use such queries because they can't understand the amount of data or even the fact that the query will actually return all this data. Making the server unresponsive for 15 minutes because a dev wasn't careful with his queries is not acceptable.
However there are times that a lot of data for output is actually needed! Other database clients know how to handle these large queries without breaking the system. For example return up to 1000 items, or do some kind of pagination, or even throw an error and tell the user that he's trying to get a lot of data which can't be handled!!!
Trying to fit that data into memory as you mention is definitely a problem/bug that needs to be fixed in redash. However I am not really sure that this is how redash behaves because redash didn't have that problem before.
Unfortunately I have experienced redash barf (fall into the mentioned memory leak) even with less amount of data for example with queries returning like 10-20 mb of data.

For now I have implemented a hacky solution of allowing the queries to run for up to 30 seconds before killing them so the leak can't grow that big and make the server unresponsive. However this means that we can't run most queries in redash and need to use dbeaver or psql directly.

As I said before this definitely is a bug that has been introduced in the previous year or something, I had never had a similar problem with redash and I am using it for like 7 years (the oldest query in my self-hosted server is from May 2017)!

justinclift commented 4 months ago

For example return up to 1000 items ...

Oh. I thought that had already been implemented for most data sources ages ago. It sounds like you're either not using one of the one's it's implemented for, or there may indeed be a bug. :frowning:

zachliu commented 4 months ago

are we talking about https://github.com/getredash/redash/issues/78? it's been 10 years and still open :grin:

spapas commented 4 months ago

No i don't think that this is related to the #78. This is a new issue, I don't remember having is before.

zachliu commented 4 months ago

well i remember differently :man_shrugging: as https://github.com/getredash/redash/issues/78 indicates, redash never has the capability of dealing with select * from big_table of all the redash versions i have used (from v0.7.0 to 24.07.0-dev), if you do select * from big_table where table has millions of rows, redash will ALWAYS experience out-of-memory. this includes MySQL, postgresql/redshift, presto, etc.

zachliu commented 4 months ago

if you're using aws, you can always set a maximum-returned-rows on the db side. we set 200k as the limit, whenever the query results exceed 200k, the db cancels the query

spapas commented 4 months ago

No we use self hosted postgres. For now we cancel the query after 30 s but this isn't a good solution for our needs.

zachliu commented 4 months ago

then i suggest "monitoring + education + training". a couple of years ago we had the same problem: inexperienced people doing select * from big_table bringing down the whole service. we have been doing the following:

monitoring who's doing what on redash
educating users on what they should and shouldn't do on redash
training programs with sufficient documentations
setting maximum-returned-rows to 200k where possible (of course not all db services let you set a limit)

little to no out-of-memory issue afterwards

spapas commented 4 months ago

Thank you @zachliu I'll try it this way!

getredash / redash

Very high memory consumption after updating to latest redash version #7048