apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.89k stars 1.11k forks source link

Implement nested identifier access ( "Nested identifiers not yet supported" ) #11445

Open alamb opened 2 months ago

alamb commented 2 months ago

Is your feature request related to a problem or challenge?

This came up on discord: https://discord.com/channels/885562378132000778/885562378132000781/1261359404197089443

> create or replace view foo as select  {'field1': { 'subfield1': 'foo', subfield2: 'bar', subfield3: [10,20,30,40]}} as "the_column";
0 row(s) fetched.
Elapsed 0.001 seconds.

> select * from foo;
+-------------------------------------------------------------------------+
| the_column                                                              |
+-------------------------------------------------------------------------+
| {field1: {subfield1: foo, subfield2: bar, subfield3: [10, 20, 30, 40]}} |
+-------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

> select the_column.field1.subfield3[2] from foo;
Internal error: Nested identifiers not yet supported for column foo.the_column.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Describe the solution you'd like

The query should work and return 20

Describe alternatives you've considered

Single level access works fine:

> create or replace view foo as select { 'subfield1': 'foo', subfield2: 'bar', subfield3: [10,20,30,40]} as "field1" ;
0 row(s) fetched.
Elapsed 0.008 seconds.

> select * from foo;
+---------------------------------------------------------------+
| field1                                                        |
+---------------------------------------------------------------+
| {subfield1: foo, subfield2: bar, subfield3: [10, 20, 30, 40]} |
+---------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.

> select field1.subfield3[2] from foo;
+----------------------------------------------------+
| get_field(foo.field1, Utf8("subfield3"))[Int64(2)] |
+----------------------------------------------------+
| 20                                                 |
+----------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

Additional context

No response

kavirajk commented 1 month ago

take

kavirajk commented 1 month ago

@alamb kinda stuck on this. I track down where the error is coming from.

But don't know which datatype (I assume Datatype::Struct?) methods needs to be modified to allow nested access here in getfield.rs. Appreciate any leads/tips? thanks.

alamb commented 1 month ago

Sorry @kavirajk I am not super familar with the struct access so I can't offer any code specific suggestions

What I would probably do is remove the check, then write some tests, and then work to get all the tests passing.

kavirajk commented 1 month ago

Thanks @alamb good idea.