ClickHouse / metabase-clickhouse-driver

ClickHouse database driver for the Metabase business intelligence front-end
Apache License 2.0
476 stars 92 forks source link

Enable `:nested-field-columns` feature in the driver #252

Open frankyso opened 3 months ago

frankyso commented 3 months ago

Enhance ClickHouse Driver to Support JSON Parsing

Introduction

Feature Description

Benefits

Additional Context

slvrtrn commented 3 months ago

From your description, it looks like we need to enable the :nested-field-columns feature in the driver and implement the required methods.

https://github.com/metabase/metabase/blob/v0.50.7/src/metabase/driver.clj#L482-L483

;; Does this database support nested fields but only for certain field types (e.g. Postgres and JSON / JSONB columns)?
:nested-field-columns

This is probably also related: https://www.metabase.com/docs/latest/data-modeling/json-unfolding

Is it correct?

frankyso commented 3 months ago

Yes, that's correct. Enabling the :nested-field-columns feature for the ClickHouse driver seems to be the appropriate approach for supporting JSON parsing. This feature, as described in the Metabase documentation, matches our requirements for handling JSON fields effectively

slvrtrn commented 3 months ago

@frankyso, I tried a few things with the driver code, and while ClickHouse works well with JSON stored in String columns, it's tricky to make the driver recognize this as a "nested" type (so the JSON features will be enabled for this particular column) because String is already matched to a "text" MB base type. Hacks like using a workaround based on a type (something like Variant(String)) don't look like a good option to me, and the old JSON ClickHouse type is now obsolete.

However, luckily, the new semi-structured data type (to be used in exactly these scenarios) is planned to be available in 24.7 (see https://github.com/ClickHouse/ClickHouse/issues/54864#issuecomment-2189803374). Using this new type will allow the implementation of the nested-field-columns feature to be much cleaner.

So, when 24.7 (or maybe even the head version containing the required changeset) is out, I will start working on this in the driver code.

cptjacky commented 1 day ago

Hello @slvrtrn !

The new JSON field was released a couple of versions ago (behind a feature flag) but its usage with Metabase is not very convenient :/

Selecting any property of the JSON field without casting it to a proper type fails with:

java.lang.IllegalArgumentException: Unknown data type: Dynamic

Would the implementation of nested-field-columns improve this or are we forced to explicitly type all of the JSON columns we use?

Thanks!

slvrtrn commented 1 day ago

@cptjacky, I think we will need to wait until https://github.com/ClickHouse/clickhouse-java/issues/1833 is resolved.

cptjacky commented 14 hours ago

You're right! I'll subscribe to that issue too, thank you!