Since Druid 0.23.0 getting columns fails with the attached error which stops Superset from adding datasets from Druid. As described on the Druid PR that introduces the relevant change:
The only change in this PR that will be apparent to most users is that now that complex type information is preserved through-out the engine, the INFORMATION_SCHEMA columns table can display the complex type information instead of OTHER:
Instantiate a sqlalchemy engine with a Druid DB (>=0.23.0).
Call get_columns() on a datasource with complex types.
Expected:
Columns are returned and complex types are mapped to BLOB.
Actual:
pydruid errors out (see stacktrace below).
2022-07-08 08:50:31,350:ERROR:root:'complex<approximatehistogram>'
Traceback (most recent call last):
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/flask_appbuilder/api/__init__.py", line 85, in wraps
return f(self, *args, **kwargs)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/views/base_api.py", line 112, in wraps
raise ex
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/views/base_api.py", line 109, in wraps
duration, response = time_function(f, self, *args, **kwargs)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/utils/core.py", line 1468, in time_function
response = func(*args, **kwargs)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/utils/log.py", line 245, in wrapper
value = f(*args, **kwargs)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/views/base_api.py", line 82, in wraps
return f(self, *args, **kwargs)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/datasets/api.py", line 257, in post
new_model = CreateDatasetCommand(g.user, item).run()
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/datasets/commands/create.py", line 46, in run
self.validate()
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/datasets/commands/create.py", line 86, in validate
if database and not DatasetDAO.validate_table_exists(
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/datasets/dao.py", line 81, in validate_table_exists
database.get_table(table_name, schema=schema)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/models/core.py", line 671, in get_table
return Table(
File "<string>", line 2, in __new__
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned
return fn(*args, **kwargs)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 563, in __new__
metadata._remove_table(name, schema)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.raise_(
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 558, in __new__
table._init(name, metadata, *args, **kw)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 647, in _init
self._autoload(
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 670, in _autoload
autoload_with.run_callable(
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2212, in run_callable
return conn.run_callable(callable_, *args, **kwargs)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1653, in run_callable
return callable_(self, *args, **kwargs)
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 484, in reflecttable
return insp.reflecttable(
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 664, in reflecttable
for col_d in self.get_columns(
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 390, in get_columns
col_defs = self.dialect.get_columns(
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/pydruid/db/sqlalchemy.py", line 178, in get_columns
return [
File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/pydruid/db/sqlalchemy.py", line 181, in <listcomp>
"type": type_map[row.DATA_TYPE.lower()],
KeyError: 'complex<approximatehistogram>'
Since Druid 0.23.0 getting columns fails with the attached error which stops Superset from adding datasets from Druid. As described on the Druid PR that introduces the relevant change:
I submitted a proposed change that would fix the issue (https://github.com/druid-io/pydruid/pull/288). Adding this issue for completeness sake.
Steps to reproduce:
get_columns()
on a datasource with complex types.Expected: Columns are returned and complex types are mapped to
BLOB
. Actual: pydruid errors out (see stacktrace below).