fullstorydev / dbt_fullstory

The official FullStory dbt package
Apache License 2.0
2 stars 3 forks source link

parse json before querying, output json column types #17

Closed huttotw closed 9 months ago

huttotw commented 9 months ago

This modifies the json_value macro to parse the JSON before attempting to extract a value. This causes the native JSON_QUERY function to return a column of JSON type instead of STRING which creates a better interface for users.

In short, now customers can query our event_properties, source_properties and user_properties using dot notation in BigQuery. Snowflake is unaffected.

SELECT
     user_properties.my_custom_property
FROM events;

instead of

SELECT
   JSON_QUERY(user_properties, '$.my_custom_property')
FROM events;

When the events table is built incrementally and this is persisted, it may enable some better performance by BigQuery since it's now aware the column is JSON (compression, etc).

huttotw commented 9 months ago

@camphillips22 the json_value function has changed with will ultimately affect the events model. @zerodelayy noticed that the column types in the event models were still be set as STRING in BigQuery. After examining the docs a little closer, JSON_QUERY only outputs JSON if the input is JSON, otherwise a STRING. Since BigQuery is synced with properties as a string, we needed to call PARSE_JSON prior to JSON_QUERY so that the return type of JSON_QUERY is JSON.

Before, you could still access JSON properties using the JSON functions provided by BigQuery, this is just a slightly more "correct" way to do it.