apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.36k stars 13.02k forks source link

Nested Fields #12940

Closed saultawil closed 2 years ago

saultawil commented 3 years ago

I would like to know if there is any plans to support nested fields like some other tools that now support this like elasticsearch/kibana.

I use Superset against a Spark SQL HiveServer2 compatible JDBC query engine that reads data from Hive tables that support nested data but in Superset I cannot selected the nested fields independently.

I spend a lot of time flattening fields for new projects. Also, for users, referencing the fields with dot notation would be much more intuitive.

A lot of modern data arrives in json with nested struct fields and is stored on an object store like S3. Since there are more and more data lakes or lakehouses front-ended with query engines like Presto and Spark with this type of nested data, users/developers want a simple way of querying this data with minimal change. So it would be very strategic and beneficial to Superset to support such a capability.

junlincc commented 3 years ago

@saultawil thanks for posting this enhancement request. sorry that we couldn't answer your questing in the meetup on Wednesday. I don't think we currently has the ability of running independent query on nested data, there might be work around that I don't know though.

Tableau offers something similar to your request - after user connect to a JSON file, it prompts users to select the schema levels. Is this what you are looking for?

renditionDownload renditionDownload (1)

@villebro @zhaoyongjie

saultawil commented 3 years ago

Thanks for the response Junlin...I suppose Tableau may offer access to nested fields but the above example is more complex because it involves arrays which can be repeatable sets of nested data. I am looking for support for something much simpler - simple struct fields where one struct field is made of multiple nested fields that occur exactly once. An example that I have seen in a similar tool called Kibana (see below image attached) - I did not set up this data source but I do not think it required any special input from the user regarding levels. The tool simply recognizes the nested fields and treats them like separate fields that can be accessed with dot notation not matter the level:

image