druid-io / pydruid

A Python connector for Druid
Other
505 stars 194 forks source link

Getting columns for a Druid datasource fails with a KeyError #289

Closed Usiel closed 1 year ago

Usiel commented 1 year ago

Since Druid 0.23.0 getting columns fails with the attached error which stops Superset from adding datasets from Druid. As described on the Druid PR that introduces the relevant change:

The only change in this PR that will be apparent to most users is that now that complex type information is preserved through-out the engine, the INFORMATION_SCHEMA columns table can display the complex type information instead of OTHER: image

I submitted a proposed change that would fix the issue (https://github.com/druid-io/pydruid/pull/288). Adding this issue for completeness sake.

Steps to reproduce:

Expected: Columns are returned and complex types are mapped to BLOB. Actual: pydruid errors out (see stacktrace below).

2022-07-08 08:50:31,350:ERROR:root:'complex<approximatehistogram>'
Traceback (most recent call last):
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/flask_appbuilder/api/__init__.py", line 85, in wraps
    return f(self, *args, **kwargs)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/views/base_api.py", line 112, in wraps
    raise ex
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/views/base_api.py", line 109, in wraps
    duration, response = time_function(f, self, *args, **kwargs)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/utils/core.py", line 1468, in time_function
    response = func(*args, **kwargs)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/utils/log.py", line 245, in wrapper
    value = f(*args, **kwargs)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/views/base_api.py", line 82, in wraps
    return f(self, *args, **kwargs)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/datasets/api.py", line 257, in post
    new_model = CreateDatasetCommand(g.user, item).run()
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/datasets/commands/create.py", line 46, in run
    self.validate()
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/datasets/commands/create.py", line 86, in validate
    if database and not DatasetDAO.validate_table_exists(
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/datasets/dao.py", line 81, in validate_table_exists
    database.get_table(table_name, schema=schema)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/superset/models/core.py", line 671, in get_table
    return Table(
  File "<string>", line 2, in __new__
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned
    return fn(*args, **kwargs)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 563, in __new__
    metadata._remove_table(name, schema)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 558, in __new__
    table._init(name, metadata, *args, **kw)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 647, in _init
    self._autoload(
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 670, in _autoload
    autoload_with.run_callable(
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2212, in run_callable
    return conn.run_callable(callable_, *args, **kwargs)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1653, in run_callable
    return callable_(self, *args, **kwargs)
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 484, in reflecttable
    return insp.reflecttable(
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 664, in reflecttable
    for col_d in self.get_columns(
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 390, in get_columns
    col_defs = self.dialect.get_columns(
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/pydruid/db/sqlalchemy.py", line 178, in get_columns
    return [
  File "/opt/.pyenv/versions/3.8.12/envs/superset/lib/python3.8/site-packages/pydruid/db/sqlalchemy.py", line 181, in <listcomp>
    "type": type_map[row.DATA_TYPE.lower()],
KeyError: 'complex<approximatehistogram>' 
Usiel commented 1 year ago

This issue would also be fixed by https://github.com/druid-io/pydruid/pull/290

Usiel commented 1 year ago

Fixed with https://github.com/druid-io/pydruid/pull/288