cloudera / hue

Open source SQL Query Assistant service for Databases/Warehouses
https://cloudera.com
Apache License 2.0
1.17k stars 366 forks source link

Hue export to csv or excel results in re-execution of the query #3420

Open satvik1992 opened 1 year ago

satvik1992 commented 1 year ago

Is there an existing issue for this?

Description

Hue export to csv or excel results in re-execution of the query . This is a performance issue because a complex query will need time to execute and re-execution of a complex query during download is a overhead .

The version of hue used is 4.7.1

We are currently using Hue presto with hive meta-store and Mariadb on a Kubernetes Environment .

Download QueryExecution

Steps To Reproduce

  1. Run a complex query that takes a considerable amount of time to fetch results
  2. Select the download to csv or excel option

Logs

============= logs when we execute the query ===================== [31/Jul/2023 03:34:02 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:05 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:05 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:08 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:08 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:11 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:11 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:14 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:14 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:17 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:17 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:18 -0700] presto INFO select * from hue__tmp limit 1000

============== logs when download to csv/excel option is clicked ==============

[31/Jul/2023 03:34:26 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:26 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:29 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:29 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:32 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:32 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:35 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:35 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:38 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:38 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:41 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:41 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:41 -0700] presto INFO select * from hue__tmp limit 1000

Hue version

4.7.1

wachoo commented 1 year ago

it may not be a bug. it is better to review the code in this flle: desktop\libs\notebook\src\notebook\connectors\base.py, and there is a function 'download' and a class 'ExecutionWrapper' about how to download csv/excel.

def download(self, notebook, snippet, file_format='csv'):
    from beeswax import data_export #TODO: Move to notebook?
    from beeswax import conf

    result_wrapper = ExecutionWrapper(self, notebook, snippet)

    max_rows = conf.DOWNLOAD_ROW_LIMIT.get()
    max_bytes = conf.DOWNLOAD_BYTES_LIMIT.get()

    content_generator = data_export.DataAdapter(result_wrapper, max_rows=max_rows, max_bytes=max_bytes)
    return export_csvxls.create_generator(content_generator, file_format)
bjornalm commented 1 year ago

@satvik1992 Thanks for reporting. Are you using sqlalchemy? As I understand Hue doesn't cache the result for sqlalchemy interface, and that is why the question is executed a second time when when exporting.

satvik1992 commented 1 year ago

@bjornalm yes we are using sqlalchemy and configuration is as below :-

interpreters: |
  [[[hive]]]
  name = Hive
  interface=hiveserver2

  [[[mysql]]]
  name = Mariadb
  interface=sqlalchemy
  options='{"url": "mysql://hue:hue@xxx-xxxxx-mariadb:3306/hue"}'

  [[[presto]]]
  name = Presto
  interface=sqlalchemy
  options='{"url": "presto://xxx-xxxxx-coordinator:80/hive/default"}'
bjornalm commented 1 year ago

@Harshg999 we discussed this briefly. Are there any planned improvements for sqlalchemy, if not should we add it to the agenda?

twoyang0917 commented 1 year ago

+1

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

github-actions[bot] commented 9 months ago

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

satvik1992 commented 2 weeks ago

Can we re-open this issue ?

Could you please inform which interfaces support caching the query results from presto in hue?

Also, can hue's task server be used for caching the query results ? If yes then could please provide a resource for configuring this?

Can cached results be stored to S3? If yes could you please provide the resources for this

bjornalm commented 2 weeks ago

Ping @Harshg999 :-)