Closed MjHow912 closed 4 years ago
Use the EsriJsonSerDe - see the trip-discovery sample (but use the Spatial-Framework-v2 names - the sample needs update for that).
So if table was created and saved in hive as:
CREATE TABLE taxi_agg(area BINARY, count DOUBLE)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.UnenclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
To get back to JSON format for exporting, I found this code and altered it a bit: ALTER TABLE taxi_agg SET FILEFORMAT INPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' OUTPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat' SERDE 'com.esri.hadoop.hive.serde.EsriJsonSerDe';
Error: FAILED: SemanticeException Cannot find class 'com.esri.hadoop.hive.serde.EsriJsonSerDe
I also tried: ALTER TABLE test15json SET FILEFORMAT INPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' OUTPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat' ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.EsriJsonSerDe';
Error: MismatchedTokenException(226!=233) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser.fileFormat(HiveParser.java:15604) at org.apache.hadoop.hive.ql.parse.HiveParser.alterStatementSuffixFileFormat(HiveParser.java:12890) at org.apache.hadoop.hive.ql.parse.HiveParser.alterTblPartitionStatementSuffix(HiveParser.java:8406) at org.apache.hadoop.hive.ql.parse.HiveParser.alterTableStatementSuffix(HiveParser.java:8159) at org.apache.hadoop.hive.ql.parse.HiveParser.alterStatement(HiveParser.java:7433) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2702) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1665) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1122) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1219) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1260) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) FAILED: ParseException line 4:1 mismatched input 'ROW' expecting SERDE near ''com.esri.json.hadoop.UnenclosedJsonInputFormat'' in file format specification
Any suggestions on what to try?
@randallwhitman Installed. And tabled altered. I still run into the problem of exporting it though. The webhdfs is going to work because of authenticating to a kerberized cluster. If I don't use the ArcMap tool of copying from hdfs and turn into JSON, can it be done manually?
If you have access, scp
and/or hdfs dfs
cli etc.
But that's invalid file. It's not JSON. The conversion to JSON is still missing. In the example it's done in the gistool. When moving manually, where/how is the conversion done?
Remind me, conversion of exactly what to what?
https://github.com/Esri/gis-tools-for-hadoop/wiki/Aggregating-CSV-Data-%28Spatial-Binning%29
Attempting to move the taxi_agg table into ArcMap, the steps after 12 are only geared towards transporting and transforming via the GISTool.
Can you spell out what needs to be converted, from what format, to what format?
The Swede allows you to take the samlle data that is in json format and allows you concert it in something that is readable in hive. Once you are finished meeting the data, the taxi_agg table is still in a format that is readable for hive. In order it to be viewed on arcMap it has to go through a conversion process; 1. copy hive table from hdfs -> temporary JSON file 2. JSON -> feature file.
The gistool toolkit that ArcMap has does the first conversion for you. If your unable to connect to hive from ArcMap, there needs to be a manual way of converting the hive table to JSON. If you export the hive table the file is deflate and does not work in ArcMap because it doesn't recognize the file type.
Take a look at the trip-discovery sample as mentioned 11/17. The output is converted to JSON, under the section entitled "Import into ArcMap and visualize" in the associated blog post. Update the recipes for v2.
@randallwhitman I believe there is a disconnect in what I'm attempting to explain to you. Once you run, create external table trip_origin_json (totnum int, samedest int, pct double, destlhs double, destbot double, destrhs double, desttop double, shape binary) row format serde 'com.esri.hadoop.hive.serde.JsonSerde' stored as inputformat 'com.esri.json.hadoop.UnenclosedJsonInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/user/rwhitman/trip-origin-json';
yes, you technically have a JSON table that's compressed. Unfortunately, that file format is .deflate not .json. In every example show, the only way to go from .deflate file to .JSON file is via the gistoolkit in ArcMap. I'm clueless as to how to do this part manually. How do I go from .deflate to .JSON without the toolkit, At this point I may need to run some sort of decompression code. Do you know what the commands are that are ran in the "Copy from HDFS" that takes 'trip-origin-json to trips-cor-01-tgcs.json'?
I don't remember the JSON being compressed. Maybe that is a change in a newer version of Hive?
Try something like hdfs dfs -cat path/to/file | zcat
.
Do you know what the commands are that are ran in the "Copy from HDFS" [tool]
@MjHow912 If I understand your question correctly, starting with Step 6 on this page: https://github.com/Esri/gis-tools-for-hadoop/wiki/Getting-the-results-of-a-Hive-query-into-ArcGIS
The ArcToolBox GeoProcessing (GP) Tool 'Copy From HDFS' is just running this code: https://github.com/Esri/geoprocessing-tools-for-hadoop/blob/master/HadoopTools.pyt#L193 The main business logic will be found here: https://github.com/Esri/geoprocessing-tools-for-hadoop/blob/master/webhdfs/webhdfs.py#L126
The GP Tool 'JSON To Features' is just running this code found here: https://github.com/Esri/geoprocessing-tools-for-hadoop/blob/master/HadoopTools.pyt#L369 With the main business logic found here (for unenclosed JSON): https://github.com/Esri/geoprocessing-tools-for-hadoop/blob/master/JSONUtil.py#L179
After you take a look at that code, please post any questions or errors you run in to with your kerberized cluster (I didn't see any errors that you ran into from having your cluster kerberized in the Original Issue Post. If you can post the actual error you get when running the GP Tool, we could probably trace down the problem and fix it in the repo, and then the GP Tool would work for others with kerberized clusters in the future).
After running the Taxi_demo results in Hive, I want to convert the table back to JSON so that I can export it to a shared location and have my users can access it and open in ArcMap. I cannot use the copy tool in ArcMap because my cluster is kerberized and opens more troubles. It's a better process for me to convert the output back to JSON. Has anyone discovered how to do so?