Best attempt at turning unstructured json into structs.
If the JSON is extremely inconsistent but you need to get it into Iceberg, configure this to be at transfrom.json.root:true. This will create a single struct with a field named "payload" and a Map<String, String> for the value.
Default configuration (transform.json.root:false) will create a struct where all first-level primitives (int, long, string, etc.) become typed on the struct. Nested objects become Map<string, string> fields to be parsed by the query engine. Arrays of primitives get typed properly, including nested arrays of primitives. Arrays of mixed types get converted to arrays of strings.
Empty nodes, empty arrays, empty objects are stripped from the struct/schema.
Without this, the json schema inference in the connector will infer nested objects as Structs. Inconsistent keys can lead to an explosion of schema evolutions and potentially hundreds to thousands of columns depending on the json. This SMT can be used to avoid that by processing the json and defining a schema that has the nested objects as Maps.
Best attempt at turning unstructured json into structs.
If the JSON is extremely inconsistent but you need to get it into Iceberg, configure this to be at
transfrom.json.root:true
. This will create a single struct with a field named "payload" and aMap<String, String>
for the value.Default configuration (
transform.json.root:false
) will create a struct where all first-level primitives (int, long, string, etc.) become typed on the struct. Nested objects becomeMap<string, string>
fields to be parsed by the query engine. Arrays of primitives get typed properly, including nested arrays of primitives. Arrays of mixed types get converted to arrays of strings.Empty nodes, empty arrays, empty objects are stripped from the struct/schema.
Without this, the json schema inference in the connector will infer nested objects as Structs. Inconsistent keys can lead to an explosion of schema evolutions and potentially hundreds to thousands of columns depending on the json. This SMT can be used to avoid that by processing the json and defining a schema that has the nested objects as Maps.