Open satvik1992 opened 1 year ago
it may not be a bug. it is better to review the code in this flle: desktop\libs\notebook\src\notebook\connectors\base.py, and there is a function 'download' and a class 'ExecutionWrapper' about how to download csv/excel.
def download(self, notebook, snippet, file_format='csv'):
from beeswax import data_export #TODO: Move to notebook?
from beeswax import conf
result_wrapper = ExecutionWrapper(self, notebook, snippet)
max_rows = conf.DOWNLOAD_ROW_LIMIT.get()
max_bytes = conf.DOWNLOAD_BYTES_LIMIT.get()
content_generator = data_export.DataAdapter(result_wrapper, max_rows=max_rows, max_bytes=max_bytes)
return export_csvxls.create_generator(content_generator, file_format)
@satvik1992 Thanks for reporting. Are you using sqlalchemy? As I understand Hue doesn't cache the result for sqlalchemy interface, and that is why the question is executed a second time when when exporting.
@bjornalm yes we are using sqlalchemy and configuration is as below :-
interpreters: |
[[[hive]]]
name = Hive
interface=hiveserver2
[[[mysql]]]
name = Mariadb
interface=sqlalchemy
options='{"url": "mysql://hue:hue@xxx-xxxxx-mariadb:3306/hue"}'
[[[presto]]]
name = Presto
interface=sqlalchemy
options='{"url": "presto://xxx-xxxxx-coordinator:80/hive/default"}'
@Harshg999 we discussed this briefly. Are there any planned improvements for sqlalchemy, if not should we add it to the agenda?
+1
This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.
This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.
This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.
Can we re-open this issue ?
Could you please inform which interfaces support caching the query results from presto in hue?
Also, can hue's task server be used for caching the query results ? If yes then could please provide a resource for configuring this?
Can cached results be stored to S3? If yes could you please provide the resources for this
Ping @Harshg999 :-)
Is there an existing issue for this?
Description
Hue export to csv or excel results in re-execution of the query . This is a performance issue because a complex query will need time to execute and re-execution of a complex query during download is a overhead .
The version of hue used is 4.7.1
We are currently using Hue presto with hive meta-store and Mariadb on a Kubernetes Environment .
Steps To Reproduce
Logs
============= logs when we execute the query ===================== [31/Jul/2023 03:34:02 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:05 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:05 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:08 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:08 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:11 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:11 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:14 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:14 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:17 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:17 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:18 -0700] presto INFO select * from hue__tmp limit 1000
============== logs when download to csv/excel option is clicked ==============
[31/Jul/2023 03:34:26 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:26 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:29 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:29 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:32 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:32 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:35 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:35 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:38 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:38 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0 [31/Jul/2023 03:34:41 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:41 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0 [31/Jul/2023 03:34:41 -0700] presto INFO select * from hue__tmp limit 1000
Hue version
4.7.1