cloudera / hue

Open source SQL Query Assistant service for Databases/Warehouses
https://cloudera.com
Apache License 2.0
1.16k stars 365 forks source link

SparkSql (Livy) WEB UI --- No data in exported CSV #2972

Closed divincode closed 1 year ago

divincode commented 2 years ago

Is the issue already present in https://github.com/cloudera/hue/issues or discussed in the forum https://discourse.gethue.com? No

Describe the bug: When we try to export data fetched with sparksql from hiveserver2 as interface through csv it works , but when we change interface as livy the exported data as csv, there is no data.

Steps to reproduce it? Run a random query ,such as show schemas and export data for both interfaces and see the results .

Hue version or source? (e.g. open source 4.5, CDH 5.16, CDP 1.0...). System info (e.g. OS, Browser...). 4.10

divincode commented 2 years ago

One more thing which I observed is that when we press export results button and then press export. I find the following error in

[21/Aug/2022 21:11:48 -0700] api ERROR <notebook.connectors.spark_shell.SparkApi object at 0x7fdf3c333850> [21/Aug/2022 21:11:48 -0700] decorators ERROR Error running export_result Traceback (most recent call last): File "/usr/lib/hue/desktop/libs/notebook/src/notebook/decorators.py", line 119, in wrapper return f(*args, **kwargs) File "/usr/lib/hue/desktop/libs/notebook/src/notebook/api.py", line 853, in export_result response['watch_url'] = api.export_data_as_hdfs_file(snippet, destination, overwrite) File "/usr/lib/hue/desktop/libs/notebook/src/notebook/connectors/base.py", line 585, in export_data_as_hdfs_file raise NotImplementedError() NotImplementedError

The implementation for export data is not written in spark_shell.py . Digging in deep.

divincode commented 2 years ago

Hi i have looked into it. Here are my findings and a quick workaround. content_generator = get_api(request, snippet).download(notebook, snippet, file_format=file_format) response = export_csvxls.make_response(content_generator, file_format, file_name, user_agent=user_agent)

This is the basic process of export first content is generated and then the content is transformed so that it can be uploaded as a cookie. The error is content generating is not proper. When the interface is hiveserver2 - /notebook/connectors/hiveserver2.py livy - /notebook/connectors/sparkshell.py

Each use different connector files , in livy calls this fxn to generate data

def fetch_result(self, notebook, snippet, rows, start_over): api = self.get_api() session = _get_snippet_session(notebook, snippet) cell = snippet['result']['handle']['id']

try: response = api.fetch_data(session['id'], cell) except Exception as e: message = force_unicode(str(e)).lower() if re.search("session ('\d+' )?not found", message): raise SessionExpired(e) else: raise e

content = response['output']

if content['status'] == 'ok': data = content['data'] images = []

try: table = data['application/vnd.livy.table.v1+json'] except KeyError: try: images = [data['image/png']] except KeyError: images = [] if 'application/json' in data: result = data['application/json'] data = result['data'] meta = [{'name': field['name'], 'type': field['type'], 'comment': ''} for field in result['schema']['fields']] type = 'table' else: data = [[data['text/plain']]] meta = [{'name': 'Header', 'type': 'STRING_TYPE', 'comment': ''}] type = 'text' else: data = table['data'] headers = table['headers'] meta = [{'name': h['name'], 'type': h['type'], 'comment': ''} for h in headers] type = 'table'

Non start_over not supported

if not start_over: data = []

return { 'data': data, 'images': images, 'meta': meta, 'type': type } elif content['status'] == 'error': tb = content.get('traceback', None)

if tb is None or not tb: msg = content.get('ename', 'unknown error')

evalue = content.get('evalue')
if evalue is not None:
  msg = '%s: %s' % (msg, evalue)

else: msg = ''.join(tb)

raise QueryError(msg)

Here u see

Non start_over not supported

if not start_over: data = []

This start over flag is causing the data to be empty maybe there is some incompete implementation, i did try to comment part (the if condition) and things work proper.

divincode commented 2 years ago

any suggestions @Harshg999

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity and is not "roadmap" labeled or part of any milestone. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity and is not "roadmap" labeled or part of any milestone. Remove stale label or comment or this will be closed in 5 days.

Harshg999 commented 1 year ago

Thanks @divincode for the detailed analysis! The issue has been fixed via https://github.com/cloudera/hue/pull/3085.

2416210017 commented 1 year ago

Thanks @divincode for the detailed analysis! The issue has been fixed via #3085.

@Harshg999 @divincode Hello, is there a temporary solution in Hue-4.10?

Harshg999 commented 1 year ago

Hey @2416210017, You can upgrade to Hue 4.11 release or patch your existing Hue 4.10 deployment with the PR changes: https://github.com/cloudera/hue/pull/3085