NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

[FEA] Add support for top level arrays in from_json #11717

Open revans2 opened 1 week ago

revans2 commented 1 week ago

Is your feature request related to a problem? Please describe. from_json supports maps, arrays and structs as top level schemas. We have partial support for MAP<STRING,STIRNG> and struct (with no maps in it) as top level objects. We should be able to work with CUDF to support arrays as top level schemas and see what that would look like.

At a minimum we could probably hack it and wrap each row with something like

{"FAKE": <ORIGINAL ROW>}

But that is not what we want to do. Instead we should work with CUDF to do the right thing.