apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.93k stars 984 forks source link

json example on drill site doesn't work #2834

Closed mbeckerle closed 9 months ago

mbeckerle commented 9 months ago

Using drill 1.21

The example on page https://drill.apache.org/docs/json-data-model/ of querying the citylots.json does not work:

SELECT features[0].properties.MAPBLKLOT, FROM dfs.`/home/mbeckerle/Downloads/citylots.json.gz`;
Error: PARSE ERROR: Encountered "." at line 1, column 19.

SQL Query: SELECT features[0].properties.MAPBLKLOT, FROM dfs.`/home/mbeckerle/Downloads/citylots.json.gz`
                             ^

[Error Id: bfb6226d-273f-4a9f-a58b-81c9cd32c18c ] (state=,code=0)

In addition, I've tried numerous variants of this query to see if I can find out what is wrong, but I can't get anything to work.

This much works:

SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`;
+----------------------------------------------------------------------------------+
|                                     feature                                      |
+----------------------------------------------------------------------------------+
| {"type":"Feature","properties":{"MAPBLKLOT":"0001001","BLKLOT":"0001001","BLOCK_NUM":"0001","LOT_NUM":"001","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"},"geometry":{"type":"Polygon","coordinates":[[[-122.42200352825247,37.80848009696725,0.0],[-122.42207601332528,37.808835019815085,0.0],[-122.42110217434863,37.808803534992904,0.0],[-122.42106256906727,37.80860105681815,0.0],[-122.42200352825247,37.80848009696725,0.0]]]}} |
| {"type":"Feature","properties":{"MAPBLKLOT":"0002001","BLKLOT":"0002001","BLOCK_NUM":"0002","LOT_NUM":"001","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"},"geometry":{"type":"Polygon","coordinates":[[[-122.42082593937107,37.80863147414603,0.0],[-122.4208580496797,37.80879564136959,0.0],[-122.4198119587043,37.80876180971401,0.0],[-122.42082593937107,37.80863147414603,0.0]]]}} |
| {"type":"Feature","properties":{"MAPBLKLOT":"0004002","BLKLOT":"0004002","BLOCK_NUM":"0004","LOT_NUM":"002","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"},"geometry":{"type":"Polygon","coordinates":[[[-122.41570120460688,37.80832725267146,0.0],[-122.4157607435932,37.808630700240904,0.0],[-122.4137878913324,37.80856680131984,0.0],[-122.41570120460688,37.80832725267146,0.0]]]}} |
+----------------------------------------------------------------------------------+

But if I then try this in a sub-select to try to dig out the MAPBLKLOT....

 select t.properties.MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;

select t.properties.MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
Error: PARSE ERROR: Encountered "." at line 1, column 9.

SQL Query: select t.properties.MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t
                   ^

[Error Id: 49670771-e7d5-41c7-9ede-9fa42e3ff261 ] (state=,code=0)
mbeckerle commented 9 months ago

I found this works, but I don't know why the "['properties']" notation is needed vs. just "."

select t.feature['properties']['MAPBLKLOT'] as MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
select t.feature['properties']['MAPBLKLOT'] as MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
+-----------+
| MAPBLKLOT |
+-----------+
| 0001001   |
| 0002001   |
| 0004002   |
+-----------+
3 rows selected (0.128 seconds)
apache drill> 
mbeckerle commented 9 months ago

It appears 'properties' is a keyword. This works, note the backticks around properties

select t.feature.`properties` as property from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
select t.feature.`properties` as property from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
+----------------------------------------------------------------------------------+
|                                     property                                     |
+----------------------------------------------------------------------------------+
| {"MAPBLKLOT":"0001001","BLKLOT":"0001001","BLOCK_NUM":"0001","LOT_NUM":"001","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"} |
| {"MAPBLKLOT":"0002001","BLKLOT":"0002001","BLOCK_NUM":"0002","LOT_NUM":"001","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"} |
| {"MAPBLKLOT":"0004002","BLKLOT":"0004002","BLOCK_NUM":"0004","LOT_NUM":"002","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"} |
+----------------------------------------------------------------------------------+
3 rows selected (0.157 seconds)
apache drill (dfs.tmp)> 

Sure enough 'PROPERTIES' is in the keywords list here: https://drill.apache.org/docs/reserved-keywords/

So the original JSON example query can be expressed as:

SELECT features[0].`properties`.MAPBLKLOT  FROM `/tmp/citydata.json`;
mbeckerle commented 9 months ago

Fixed in drill-site commit de30f12f4b37f73caac35660d0cae8f8d88f9b7a

jnturton commented 9 months ago

@mbeckerle thanks for this!