Open doublebyte1 opened 5 years ago
Thanks for reporting this. I assume "version 2.0.0" refers to Spatial Framework for Hadoop. Please let us know the versions of Hive and Hadoop.
@randallwhitman Hadoop 2.8.5, Hive 2.3.4
Thanks for the details. We do not have Hive-2.3.4 (nor Hadoop-2.8.5) installed, and unfortunately the testing framework is not at the level of making it easy to paste a sample query into a test - Esri/spatial-framework-for-hadoop#163. Maybe it will reproduce with another version of Hive or with SparkSql.
I can confirm that both issues reproduce on Hadoop 2.8.3 and Hive 2.3.2.
I took a look at reading Enclosed Esri JSON, using 15 points from the JSON-MR mini-sample, and Hive-2.3.5 read the table data OK.
create external table test15eej(rowid int, shape binary)
row format serde 'com.esri.hadoop.hive.serde.EsriJsonSerDe'
stored as inputformat 'com.esri.json.hadoop.EnclosedEsriJsonInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'hdfs://hdfs:8020/path/to/test15_eej';
hive> select rowid, ST_AsText(shape) from test15eej;
1505 POINT (15 5)
535 POINT (5 35)
2323 POINT (23 23)
3222 POINT (32 22)
3728 POINT (37 28)
2233 POINT (22 33)
2838 POINT (28 38)
3434 POINT (34 34)
6219 POINT (62 19)
7114 POINT (71 14)
7525 POINT (75 25)
6535 POINT (65 35)
5549 POINT (55 49)
6545 POINT (65 45)
4566 POINT (45 66)
I guess that tests only reading not writing.
Finally repro the reported issue.
create external table test15eej(rowid int, shape binary)
row format serde 'com.esri.hadoop.hive.serde.EsriJsonSerDe'
stored as inputformat 'com.esri.json.hadoop.EnclosedEsriJsonInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'file:///tmp/test15eej';
hive> select rowid, ST_AsText(shape) from write15eej;
OK
Time taken: 0.154 seconds
The output file was in fact unenclosed - cat /tmp/write15eej/000000_0
:
{"attributes":{"rowid":1505},"geometry":{"x":15,"y":5}}
{"attributes":{"rowid":535},"geometry":{"x":5,"y":35}}
{"attributes":{"rowid":2323},"geometry":{"x":23,"y":23}}
{"attributes":{"rowid":3222},"geometry":{"x":32,"y":22}}
{"attributes":{"rowid":3728},"geometry":{"x":37,"y":28}}
{"attributes":{"rowid":2233},"geometry":{"x":22,"y":33}}
{"attributes":{"rowid":2838},"geometry":{"x":28,"y":38}}
{"attributes":{"rowid":3434},"geometry":{"x":34,"y":34}}
{"attributes":{"rowid":6219},"geometry":{"x":62,"y":19}}
{"attributes":{"rowid":7114},"geometry":{"x":71,"y":14}}
{"attributes":{"rowid":7525},"geometry":{"x":75,"y":25}}
{"attributes":{"rowid":6535},"geometry":{"x":65,"y":35}}
{"attributes":{"rowid":5549},"geometry":{"x":55,"y":49}}
{"attributes":{"rowid":6545},"geometry":{"x":65,"y":45}}
{"attributes":{"rowid":4566},"geometry":{"x":45,"y":66}}
create external table alt15uej(rowid int, shape binary)
row format serde 'com.esri.hadoop.hive.serde.EsriJsonSerDe'
stored as inputformat 'com.esri.json.hadoop.UnenclosedEsriJsonInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'file:///tmp/write15eej'
hive> select rowid, ST_AsText(shape) from alt15uej limit 2;
OK
1505 POINT (15 5)
535 POINT (5 35)
Time taken: 0.146 seconds, Fetched: 2 row(s)
With larger data, the output would be expected to span multiple files. In that case, it's not clear how the file[s] could be enclosed at all - maybe each file of the collection could have Enclosed format?
I am following the instructions in this tutorial, and I am able to create a table using the UnenclosedEsriJsonInputFormat.
However, I would like to use the enclosed format.
I have tried these two serdes:
Although I am able to create the table, and insert data, when I do a select the result is always empty:
select ST_AsGeoJSON(area), count from taxi_agg;
ChangingEnclosedEsriJsonInputFormat
toUnenclosedEsriJsonInputFormat
, orEnclosedGeoJsonInputFormat
toUnenclosedGeoJsonInputFormat
gives correct results.Not sure if I am doing something wrong, or if there is a problem with the Enclosed Serde.
Version: 2.0.0