data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
221 stars 78 forks source link

Worksheet Run Query - 504 Gateway Timeout #1342

Open sandeephs1 opened 4 weeks ago

sandeephs1 commented 4 weeks ago

Describe the bug

Running query in worksheet errors with the below messages - "Network error occurred", Network error: Failed to fetch"

How to Reproduce

Create a dataset, load dataset with parquet data create a worksheet, run a query to fetch data from the dataset. If the query is limited with 10 or 5000, it gets completed if the limit exceeds certain number like 15000 etc it fails with error "Network error occurred", Network error: Failed to fetch" NOTE: Limit number in the query cannot be determined

Expected behavior

Worksheet should fetch the result and display it

Your project

No response

Screenshots

No response

OS

Mac Sonoma 14.3

Python version

3.9.13

AWS data.all version

2.0.0

Additional context

We noticed, if data is big, Run query ends up with “504 Gateway Timeout” error in Application console, though Athena Completes the query execution.

dlpzx commented 2 weeks ago

Hi @sandeephs1, sorry for the late response. This error can be caused by API Gateway. 504 Gateway Timeout error is a very typical error that appears when the API takes longer than 29s to respond. What that means is that even if Lambda has a timeout of 15 minutes, it needs to return a response to the API in less than 29s. In your case the API call is waiting for a response from Athena that for "long" queries results in a timeout of the API call.

For this reason we have the async. Worker Lambda in data.all. The API handler process the response to API GW, while the Worker "works" on executing longer tasks

image

The 29s limitation has been a hard limit until very recently. Because of the increasing number of REST APIs that use genAI under the hood, API Gateway announced on June 4th that the 29s can be surpassed: https://aws.amazon.com/about-aws/whats-new/2024/06/amazon-api-gateway-integration-timeout-limit-29-seconds/ I have not tried this new feature yet, but it can be useful for your case

dlpzx commented 4 days ago

HI @sandeephs1 any update on this issue?

sandeephs1 commented 4 days ago

Hi @dlpzx yew we were able to handle it, made async. For us it is 2 features - "Run Query" and "Download" the result output. Currently we are testing it. Will revert once the testing is complete