apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.72k stars 13.85k forks source link

Lack of support for arrag_agg on column type tstzrange in PostgreSQL #28200

Open Molrn opened 6 months ago

Molrn commented 6 months ago

Bug description

Superset returns an error when I try a query with an array_agg on a tstzrange column. The same error was previously returned in #19995 when querying a tstzrange column on its own (which does not return any error in my superset version).

How to reproduce the bug

Considering Superset have a connection to a PostgreSQL database and there is a table with one column of type tstzrange, go to "SQL Lab" -> "SQL Editor" and try a select query that includes an array_agg on the tstzrange column. e.g: SELECT ARRAY_AGG(date_range) FROM tstzrange_example.
SELECT date_range FROM tstzrange_example should work (resolved in #19995)

Screenshots/recordings

image

Superset version

3.1.2

Python version

3.9

Node version

I don't know

Browser

Firefox

Additional context

Logs

Triggering query_id: 63
superset_app          | 2024-04-24 16:05:26,686:INFO:superset.commands.sql_lab.execute:Triggering query_id: 63
superset_app          | Query 63: Executing 1 statement(s)
superset_app          | 2024-04-24 16:05:26,717:INFO:superset.sql_lab:Query 63: Executing 1 statement(s)
superset_app          | Query 63: Set query to 'running'
superset_app          | 2024-04-24 16:05:26,717:INFO:superset.sql_lab:Query 63: Set query to 'running'
superset_app          | Query 63: Running statement 1 out of 1
superset_app          | 2024-04-24 16:05:26,956:INFO:superset.sql_lab:Query 63: Running statement 1 out of 1
superset_app          | 2024-04-24 16:05:26,990:DEBUG:root:Query 63: Running query: select array_agg(date_range) from tstzrange_example
superset_app          | LIMIT 1001
superset_app          | 2024-04-24 16:05:27,093:DEBUG:root:Query 63: Handling cursor
superset_app          | SupersetErrorsException
superset_app          | Traceback (most recent call last):
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
superset_app          |     rv = self.dispatch_request()
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
superset_app          |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
superset_app          |     return f(self, *args, **kwargs)
superset_app          |   File "/app/superset/views/base_api.py", line 127, in wraps
superset_app          |     raise ex
superset_app          |   File "/app/superset/views/base_api.py", line 121, in wraps
superset_app          |     duration, response = time_function(f, self, *args, **kwargs)
superset_app          |   File "/app/superset/utils/core.py", line 1463, in time_function
superset_app          |     response = func(*args, **kwargs)
superset_app          |   File "/app/superset/views/base_api.py", line 93, in wraps
superset_app          |     return f(self, *args, **kwargs)
superset_app          |   File "/app/superset/utils/log.py", line 255, in wrapper
superset_app          |     value = f(*args, **kwargs)
superset_app          |   File "/app/superset/sqllab/api.py", line 407, in execute_sql_query
superset_app          |     command_result: CommandResult = command.run()
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 121, in run
superset_app          |     raise ex
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 103, in run
superset_app          |     status = self._run_sql_json_exec_from_scratch()
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 161, in _run_sql_json_exec_from_scratch
superset_app          |     raise ex
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 156, in _run_sql_json_exec_from_scratch
superset_app          |     return self._sql_json_executor.execute(
superset_app          |   File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
superset_app          |     raise SupersetErrorsException(
superset_app          | superset.exceptions.SupersetErrorsException: [SupersetError(message="Unserializable object [2022-06-08 15:17:43.554000+00:00, 2022-06-09 15:17:49.528000+00:00] of type <class 'psycopg2._range.DateTimeTZRange'>", error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'PostgreSQL', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app          | 2024-04-24 16:05:27,314:WARNING:superset.views.base:SupersetErrorsException
superset_app          | Traceback (most recent call last):
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
superset_app          |     rv = self.dispatch_request()
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
superset_app          |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
superset_app          |     return f(self, *args, **kwargs)
superset_app          |   File "/app/superset/views/base_api.py", line 127, in wraps
superset_app          |     raise ex
superset_app          |   File "/app/superset/views/base_api.py", line 121, in wraps
superset_app          |     duration, response = time_function(f, self, *args, **kwargs)
superset_app          |   File "/app/superset/utils/core.py", line 1463, in time_function
superset_app          |     response = func(*args, **kwargs)
superset_app          |   File "/app/superset/views/base_api.py", line 93, in wraps
superset_app          |     return f(self, *args, **kwargs)
superset_app          |   File "/app/superset/utils/log.py", line 255, in wrapper
superset_app          |     value = f(*args, **kwargs)
superset_app          |   File "/app/superset/sqllab/api.py", line 407, in execute_sql_query
superset_app          |     command_result: CommandResult = command.run()
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 121, in run
superset_app          |     raise ex
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 103, in run
superset_app          |     status = self._run_sql_json_exec_from_scratch()
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 161, in _run_sql_json_exec_from_scratch
superset_app          |     raise ex
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 156, in _run_sql_json_exec_from_scratch
superset_app          |     return self._sql_json_executor.execute(
superset_app          |   File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
superset_app          |     raise SupersetErrorsException(
superset_app          | superset.exceptions.SupersetErrorsException: [SupersetError(message="Unserializable object [2022-06-08 15:17:43.554000+00:00, 2022-06-09 15:17:49.528000+00:00] of type <class 'psycopg2._range.DateTimeTZRange'>", error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'PostgreSQL', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app          | 90.63.193.49 - - [24/Apr/2024:16:05:27 +0000] "POST /api/v1/sqllab/execute/ HTTP/1.1" 500 371 "https://dataviz.hyvilo.net/sqllab/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
superset_app          | 90.63.193.49 - - [24/Apr/2024:16:05:28 +0000] "POST /superset/log/?explode=events HTTP/1.1" 200 1 "https://dataviz.hyvilo.net/sqllab/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
superset_app          | 90.63.193.49 - - [24/Apr/2024:16:05:31 +0000] "PUT /tabstateview/15 HTTP/1.1" 200 2 "https://dataviz.hyvilo.net/sqllab/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"

Checklist

surapuramakhil commented 6 months ago

Is this bug? Or unsupported? If it's a bug, I would like to fix it.

Can some help me to point serialization code. I have tried debugging, but not able to find that due to missing of stack in catch and throw.

For this "Unserializable object [2022-06-08 15:17:43.554000+00:00, 2022-06-09 15:17:49.528000+00:00] of type <class 'psycopg2._range.DateTimeTZRange'>" only message has been captured and thrown but its missing stack trace. This is coming as errors of data object from get_sql_query_results task.

https://github.com/apache/superset/blob/76d897eaa2f9e137102bc194c2e3109c29d0348f/superset/sqllab/sql_json_executer.py#L110

I have done dump of data object, unfortunately on errors it only on contains error message but not stacktrace

Dump of data object at L110

superset_app           | 2024-04-28 02:05:25,134:INFO:superset.sqllab.sql_json_executer:{'query_id': 25, 'status': 'failed', 'error': "Unserializable object [2023-04-27 08:00:00+00:00, 2023-04-27 10:00:00+00:00) of type <class 'psycopg2._range.DateTimeTZRange'>", 'errors': [{'message': "Unserializable object [2023-04-27 08:00:00+00:00, 2023-04-27 10:00:00+00:00) of type <class 'psycopg2._range.DateTimeTZRange'>", 'error_type': <SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, 'level': <ErrorLevel.ERROR: 'error'>, 'extra': {'engine_name': 'PostgreSQL', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]}}]}
surapuramakhil commented 6 months ago

Able to extract stacktrace - its issue with converting numpy to pyArray

superset_app           | 2024-04-29 17:26:16,337:ERROR:superset.result_set:Could not convert DateTimeTZRange(datetime.datetime(2023, 4, 27, 8, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2023, 4, 27, 10, 0, tzinfo=datetime.timezone.utc), '[)') with type DateTimeTZRange: did not recognize Python value type when inferring an Arrow data type
superset_app           | Traceback (most recent call last):
superset_app           |   File "/app/superset/result_set.py", line 141, in __init__
superset_app           |     pa_data.append(pa.array(array[column].tolist()))
superset_app           |   File "pyarrow/array.pxi", line 344, in pyarrow.lib.array
superset_app           |   File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
superset_app           |   File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
superset_app           |   File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
surapuramakhil commented 6 months ago

@sfirke can you assign this task to me.

My MR https://github.com/apache/superset/pull/28268