cloudera / hue

Open source SQL Query Assistant service for Databases/Warehouses
https://cloudera.com
Apache License 2.0
1.17k stars 366 forks source link

hue stops polling for results abruptly after 5 min leading to network error in case of large file csv downloads #3435

Closed satvik1992 closed 1 year ago

satvik1992 commented 1 year ago

Is there an existing issue for this?

Description

We are getting an error while downloading the presto query result from Hue UI. We are using hue 4.7.1 to execute queries on Presto, after executing the query we use download the query result as CSV option in Hue UI. The download progresses for sometime but then it fails (We get this error in the browser: Failed - Network error). We do not find any error logs on the hue pods though

We observe this issue for large file csv downloads with the root cause that hue stops polling for results abruptly after 5 min. This happens 4 out of 5 times and hence the download fails with network error . Is there any configuration/ time-out parameter that can be configured on the hue end that can fix this issue ?

NetworkErrorCsv

Steps To Reproduce

  1. Run a query that contains at least 5 columns and 10 million rows.
  2. Click on the option to download as a csv
  3. Observe that the download fails with message as network error

Logs

No response

Hue version

4.7.1

bjornalm commented 1 year ago

@ranade1 I know this is for an old release, but do you know anything about this, can it be configured?

ranade1 commented 1 year ago

Could you please provide more details on the following?

satvik1992 commented 1 year ago

@ranade1 please find the details below :-

  1. Which version of Hue are you using? - 4.7.1
  2. Is Hue running on a CDH On-Prem setup? - its running as CDH with a Kubernetes environment
  3. What is the size of the data file you are trying to download? - around 15Gb as both excel and csv
  4. Can you share the configuration specifics of the Hue server?

interpreters: | [[[hive]]] name = Hive interface=hiveserver2

[[[mysql]]] name = Mariadb interface=sqlalchemy options='{"url": "mysql://hue:hue@xxx-xxxxx-mariadb:3306/hue"}'

[[[presto]]] name = Presto interface=sqlalchemy options='{"url": "presto://xxx-xxxxx-coordinator:80/hive/default"}'

  [beeswax]
  download_bytes_limit=-1
  cherrypy_server_threads=100
  download_row_limit=-1

Let me know if these details suffice

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.