druid-io / pydruid

A Python connector for Druid
Other
505 stars 194 forks source link

fix: Druid Type mappings for Druid >= 0.23.0 #288

Closed Usiel closed 1 year ago

Usiel commented 1 year ago

Druid 0.23.0 returns more granular types in the INFORMATION_SCHEMA.COLUMNS DATA_TYPE column (see PR on Druid here). This breaks the the get_columns method in the DruidDialect class, as it doesn't know what to do with values like "Complex<thetaSketch>", which previously used to be returned with DATA_TYPE="OTHER".

With this suggested change we instead use the JDBC types Druid provides and map them to the sqlalchemy types. This has the advantage that Druid can figure out the mapping of the complex types and we just consume what Druid tells us, so hopefully this makes the mapping more resistant to future changes on the Druid side.

FYI this bug breaks Superset, it is not possible to add Datasets with complex columns when using Druid 0.23.0.

How to reproduce the issue

  1. Follow the https://druid.apache.org/docs/0.23.0/tutorials/index.html guide up to step 13
  2. Add a thetaSketch metric in the generated ingestion spec:
    "metricsSpec": [
        {
          "type": "thetaSketch",
          "name": "complex_metric",
          "fieldName": "countryName"
        }
    ]
  3. Continue with the remaining steps
  4. Getting the MetaData relies on get_columns(...), which fails without this PR (KeyError: 'complex<thetasketch>')
    from sqlalchemy import create_engine, MetaData
    engine = create_engine('druid://druid-router.dca.tumblr.net:8082/druid/v2/sql')
    meta = MetaData(schema="druid")
    meta.reflect(engine)

    Expected: With the new code the reflection works and complex metrics are mapped to BLOB.

gianm commented 1 year ago

@Usiel I think there's a conflict between this patch and your other one. Could you fix it up please?

Usiel commented 1 year ago

@Usiel I think there's a conflict between this patch and your other one. Could you fix it up please?

Rebased on master, thanks for the reviews!