Open murphyatwork opened 7 months ago
Maybe, we could implement the second one first, and try to do the first one in the future (if needed), which will use the implementation for the second one.
+1 on this issue. We are currently evaluating StarRocks as a replacement for Trino. However, we very much rely on being able to cast JSON values to Structs at query time which a feature supported by Trino but not by StarRocks.
Using named_struct and parse_json as well as json extract functions works for simple objects decently enough. However, json objects that contain arrays of objects become very challenging to extract if you want to preserve the relations of keys in the inner objects and not flatten the data.
Thanks, @kyle-goodale-klaviyo . This issue focuses on stream load, and we have implemented the conversion of JSON to structs during data ingestion.
Could you share some examples (SQL snippets) demonstrating how to use Trino to convert JSON values into structs?
CC @wangsimo0
@Dshadowzh Absolutely, here is a minimal example that includes nested objects, as well as arrays of objects:
SELECT
CAST(json_parse(
'{"a": 5, "b_obj": {"bb": true}, "b_arr_obj": [{"foo": "bar"}, {"foo": "baz"}]}'
) AS ROW(
a int,
b_obj ROW(
bb boolean
),
b_arr_obj ARRAY(
ROW(
foo varchar
)
)
)
)
Trino calls structs rows, but they are the same thing. One of the important parts is that when casting an array of objects it preserves them as arrays of structs. Currently, I can somewhat get around this issue in StarRocks by using named struct and extracting the json values manually.
StarRocks example:
named_struct(
'first_name', get_json_string(`data`, '$.first_name'),
'last_name', get_json_string(`data`, '$.last_name')
)
This works great for objects and even nested objects. However, this breaks down when you encounter an array of objects if you wish to preserve the inner objects as structs within an array. This method of extraction requires that you flatten everything as there is no way to traverse the json lists and extract values out while maintaining there relations to one another.
Please do let me know if I can provide any more examples or clarify the issue further!
@Dshadowzh Absolutely, here is a minimal example that includes nested objects, as well as arrays of objects:
SELECT CAST(json_parse( '{"a": 5, "b_obj": {"bb": true}, "b_arr_obj": [{"foo": "bar"}, {"foo": "baz"}]}' ) AS ROW( a int, b_obj ROW( bb boolean ), b_arr_obj ARRAY( ROW( foo varchar ) ) ) )
Trino calls structs rows, but they are the same thing. One of the important parts is that when casting an array of objects it preserves them as arrays of structs. Currently, I can somewhat get around this issue in StarRocks by using named struct and extracting the json values manually.
StarRocks example:
named_struct( 'first_name', get_json_string(`data`, '$.first_name'), 'last_name', get_json_string(`data`, '$.last_name') )
This works great for objects and even nested objects. However, this breaks down when you encounter an array of objects if you wish to preserve the inner objects as structs within an array. This method of extraction requires that you flatten everything as there is no way to traverse the json lists and extract values out while maintaining there relations to one another.
Please do let me know if I can provide any more examples or clarify the issue further!
it's better to implement a native function converting JSON ARRAY to ARRAY
Feature request
Is your feature request related to a problem? Please describe.
STRUCT
type is more performant thatJSON
in terms of query, but which use the fixed schema and annoying to construct.If it's possible to convert JSON to STRUCT directly, like
to_struct(JSON)
, it can be much easier to use.Describe the solution you'd like
select to_struct(parse_json("{"a": 1}"))
select to_struct<a int, b int>(parse_json("a": 1, "b": 2))
Describe alternatives you've considered
Additional context