[Bug]-[3.5]:Certain Redshift queries time out

areyabhishek commented 3 years ago

Description

User reported that they find a few Redshift queries timing out even though they've expanded the timeout. They see a message: ERROR: Query cancelled on user's request.

User says that this happens for complex queries or where the number of rows in the response are higher.

The error shows up inconsistently. From Discord: "so just to clarify, if I try running the same query 5-10 times, it ends up giving an output. It's not that it never renders an output.

This is agnostic of the size of the response, it even happens for output which is lesser than 20 entries.

The size of the dataset being queried is 300K entries"

User is on cloud hosted edition.

areyabhishek commented 3 years ago

@sumitsum could you check out this issue with Redshift queries?

avishekwrx commented 3 years ago

Hi Appsmithorg/Appsmith,

Just to clarify, these queries are running via other tools in redshift, within 5-7 seconds. If I run the same query through redash, it runs effortlessly.

Regards, Avishek Mitra

On Wed, May 5, 2021 at 12:10 AM abhishek nayak @.***> wrote:

@sumitsum https://github.com/sumitsum could you check out this issue with Redshift queries?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/appsmithorg/appsmith/issues/4276#issuecomment-832159461, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATNB4ENNJ5FFLPF43V2XWMDTMA5QTANCNFSM44BRRLAQ .

avishekwrx commented 3 years ago

Hi, Sharing a snapshot of the same, with the error

sumitsum commented 3 years ago

Hi @avishekwrx , thanks for bringing this issue to our notice. It seems the timeout is triggered by Redshift:

The statement_timeout value is the maximum amount of time that a query can run before Amazon Redshift terminates it. When a statement timeout is exceeded, then queries submitted during the session are aborted with the following error message:

ERROR:  Query (150) cancelled on user's request
To verify whether a query was aborted because of a statement timeout, run following query:

select * from SVL_STATEMENTTEXT where text ilike '%set%statement_timeout%to%' and pid in (select pid from STL_QUERY where query = <queryid>);
Statement timeouts can also be set in the cluster parameter group. Check your cluster parameter group and any statement_timeout configuration settings for additional confirmation. For more information about the cluster parameter group and statement_timeout settings, see Modifying a parameter group.

src: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-query-abort/

Regarding why it is working with Redash, it may be because Redash caches the result:

Get results, fast - shorter on-demand running times, all query results are cached, so you don't have to wait for the same result set every time.

src: https://redash.io/data-sources/amazon-redshift

APMU there are two ways to overcome this issue:

Increase the Redshift timeout.
Use offset based pagination in case you are displaying the data in a table : https://docs.appsmith.com/core-concepts/displaying-data-read/display-data-tables#pagination

sam6230i commented 3 years ago

We are also facing the same issue with multiple redshift queries in the dashboard. We have tried increasing the timeout and still the problem persist. This is happening with some simple as well as with some complex queries, where we are trying to fetch only a few records (max 20 records). So trying out pagination in such queries does not make sense.

avishekwrx commented 3 years ago

Hi @sumitsum : Thank you for the detailed response. Adding to the pagination issue which Sammer mentioned above,the timeout has been increased 10X from the default 10k ms threshold to 100K ms, beyond which it would not help further.

sumitsum commented 3 years ago

Hi @avishekwrx, can you please try out the following once:

the documentation says that a min of WLM timeout and statement_timeout is used if both are specified. Can you please check if WLM timeout is specified for you ? If so, can you please try by increasing that as well.
Can you please try running the query after setting the statement_timeout to 0 which indicates no limit.

If WLM timeout (max_execution_time) is also specified as part of a WLM configuration, the lower of statement_timeout and max_execution_time is used. For more information, see WLM timeout.

https://docs.aws.amazon.com/redshift/latest/dg/r_statement_timeout.html

sumitsum commented 3 years ago

Hi @avishekwrx, would it be possible for you to share your app with me (sumit@appsmith.com) ? This will help me understand the problem better and come up with a plausible solution / workaround which we can then discuss later on a zoom call possibly.

Nikhil-Nandagopal commented 3 years ago

@avishekwrx we're closing this issue for now. Feel free to reopen it, if it has not been resolved yet.

avishekwrx commented 3 years ago

Hi Nikhil, this issue is still recurring. The resolution provided by the team is not helpful.

If possible, we could get on a call for me to show you this issue. We have also reconfigured our redshift back-end to provide optimum speed in other dashboarding tools.

Avishek

On Fri, Jul 16, 2021, 12:33 Nikhil Nandagopal @.***> wrote:

Closed #4276 https://github.com/appsmithorg/appsmith/issues/4276.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/appsmithorg/appsmith/issues/4276#event-5029211155, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATNB4EJZE5IAY5Q6JBYKUHTTX7KV3ANCNFSM44BRRLAQ .

trishaanand commented 3 years ago

@avishekwrx : Could you book a time which works the best for you for a debugging session for this issue? : https://calendly.com/trishaanand

trishaanand commented 3 years ago

Hi @avishekwrx . Haven't heard from you. Is it possible for us to get on a call sometime so that we can debug what is causing this behaviour?

avishekwrx commented 3 years ago

Sure Trisha, We can do it at 3PM tomorrow.

On Wed, Jul 21, 2021 at 5:39 PM Trisha Anand @.***> wrote:

Hi @avishekwrx https://github.com/avishekwrx . Haven't heard from you. Is it possible for us to get on a call sometime so that we can debug what is causing this behaviour?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/appsmithorg/appsmith/issues/4276#issuecomment-884139568, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATNB4EO7WT2FYA75DWKP3Z3TY22GZANCNFSM44BRRLAQ .

trishaanand commented 3 years ago

@avishekwrx : Please join us for a call at 3 PM (IST) on the 22nd of July on zoom here : https://us02web.zoom.us/j/88915804150?pwd=R0Mvb3ZHMjVHaXlMS09XVXh2TS8wZz09

trishaanand commented 3 years ago

@avishekwrx : Hi! We have joined the call and are waiting for you. Please find the zoom link here : https://us02web.zoom.us/j/88915804150?pwd=R0Mvb3ZHMjVHaXlMS09XVXh2TS8wZz09

Nikhil-Nandagopal commented 3 years ago

@trishaanand were we able to triage this issue?

sumitsum commented 3 years ago

@Nikhil-Nandagopal I was trying to reproduce this issue, but need some more time. I will try to prioritize this.

rohan-arthur commented 2 years ago

Redshift 6m usage: 28 Assume 25% reach: 7

Stats

Stat	Values
Reach	7
Effort (months)	1

pearsonhenri commented 2 years ago

We're running into what appears to be a related issue: Redshift queries are timing out, and in the Appsmith logs, we're seeing things like this:

[11/Nov/2022:17:48:06 +0000] "POST /api/v1/actions/execute HTTP/1.1" 499 0 "https://appsmith.hub.legiti.io/app/rt-simulator/page1-63484816bd0ab448c77f6e95/edit" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"
--
Fri, Nov 11 2022 12:48:16 pm | editor stdout \|
Fri, Nov 11 2022 12:48:16 pm | backend stdout \| [2022-11-11 17:48:07,122] - Connection leak detection triggered for com.amazon.redshift.jdbc.RedshiftConnectionImpl@39c77ded on thread elastic-1997, stack trace follows
Fri, Nov 11 2022 12:48:16 pm | backend stdout \| java.lang.Exception: Apparent connection leak detected

That 499 HTTP status code seems to indicate that the client (Appsmith?) is terminating the request, perhaps due to some misconfigured connection leak detection (perhaps being triggered by longer-running queries).

It would be great if there were a configurable parameter for query timeouts (if that's indeed the root cause here). As it stands, this makes Appsmith unsuitable for many of our use cases.

appsmithorg / appsmith

[Bug]-[3.5]:Certain Redshift queries time out #4276

Description

Stats