Esri / spatial-framework-for-hadoop

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
Apache License 2.0
363 stars 160 forks source link

Output with Enclosed InputFormats #177

Open randallwhitman opened 3 years ago

randallwhitman commented 3 years ago

See Esri/gis-tools-for-hadoop#83

randallwhitman commented 3 years ago

With larger data, the output would be expected to span multiple files. In that case, it's not clear how the file[s] could be enclosed at all - maybe each file of the collection could have Enclosed format? Does the InputFormat have the info when an output file is started? Maybe a custom OutputFormat would be needed?

randallwhitman commented 3 years ago

maybe each file of the collection could have Enclosed format

I think that can be done by extending/implementing/overriding some-to-all of:

randallwhitman commented 3 years ago

See also - https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java The abstract base class appears introduced in Hive-0.11 , which is probably OK if we cease supporting older.

randallwhitman commented 3 years ago

com.esri.json.hadoop.Enclosed{Esri,Geo}JsonOutputFormat and/or com.esri.hadoop.hive.json.EnclosedEachJsonHiveOutputFormat