getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
http://redash.io/
BSD 2-Clause "Simplified" License
25.92k stars 4.34k forks source link

Very high memory consumption after updating to latest redash version #7048

Open spapas opened 2 months ago

spapas commented 2 months ago

Hello, I'm using the latest redash version (from github): commit 76b5a30fd97330445b0d28553bf91e50fcc10e7f. This has been installed on a bare metal alma 9 server. After updating to this version I experience very large memory consumption from the rq worker (I start it with python manage.py rq worker). It takes all the memory of the server (like 15 GB and even the swap file) making the server unresponsive while the query runs. The query finishes after 10 minutes or so and returs the resuts so I don't get any actual error.

However the same query takes 1-2 seconds when I run it on dbeaver and returns ~ 2000 rows.

Please notice that I experience this behavior only on that query, for other (smaller) queries it works without problems.

The query is run on postgres 15.

Thank you for any help

spapas commented 1 month ago

Hello friends, this proble is still buggin us. We use the latest redash version and try to execute a query as simple as this:

select * from passenger

Please notice that the passenger table is really big (like 200M rows). The problem is when we run the query the redash rqworker starts consuming huge amounts of memory (like 10GB, all the free memory of the server) and makes the whole server unresponsive.

Can you please give a little love to this because we can't really use redash if we have this problem.

spapas commented 1 month ago

There must definitely be a memory leak somewhere. Can you try running a query returning a lot of resutls on a postgresql database to check the behaviour ?

justinclift commented 1 month ago

select * from passenger

That's a good example of a SQL query to never run, as that will attempt to return the entire contents of the table (all fields, all data) back to Redash, which will attempt to fit that in memory.

Instead you should be running queries that extract the summary data you want. Something like this instead:

select count(*) from passenger

Which will return the number of rows in the passenger table.

spapas commented 1 month ago

First of all I am aware of the fact that this query returns a lot of data and shouldn't be run (usually), however please notice:

For now I have implemented a hacky solution of allowing the queries to run for up to 30 seconds before killing them so the leak can't grow that big and make the server unresponsive. However this means that we can't run most queries in redash and need to use dbeaver or psql directly.

As I said before this definitely is a bug that has been introduced in the previous year or something, I had never had a similar problem with redash and I am using it for like 7 years (the oldest query in my self-hosted server is from May 2017)!

justinclift commented 1 month ago

For example return up to 1000 items ...

Oh. I thought that had already been implemented for most data sources ages ago. It sounds like you're either not using one of the one's it's implemented for, or there may indeed be a bug. :frowning:

zachliu commented 1 month ago

are we talking about https://github.com/getredash/redash/issues/78? it's been 10 years and still open :grin:

spapas commented 1 month ago

No i don't think that this is related to the #78. This is a new issue, I don't remember having is before.

zachliu commented 1 month ago

well i remember differently :man_shrugging: as https://github.com/getredash/redash/issues/78 indicates, redash never has the capability of dealing with select * from big_table of all the redash versions i have used (from v0.7.0 to 24.07.0-dev), if you do select * from big_table where table has millions of rows, redash will ALWAYS experience out-of-memory. this includes MySQL, postgresql/redshift, presto, etc.

zachliu commented 1 month ago

if you're using aws, you can always set a maximum-returned-rows on the db side. we set 200k as the limit, whenever the query results exceed 200k, the db cancels the query

spapas commented 1 month ago

No we use self hosted postgres. For now we cancel the query after 30 s but this isn't a good solution for our needs.

zachliu commented 1 month ago

then i suggest "monitoring + education + training". a couple of years ago we had the same problem: inexperienced people doing select * from big_table bringing down the whole service. we have been doing the following:

little to no out-of-memory issue afterwards

spapas commented 1 month ago

Thank you @zachliu I'll try it this way!