Output with Enclosed InputFormats

randallwhitman commented 3 years ago

See Esri/gis-tools-for-hadoop#83

randallwhitman commented 3 years ago

With larger data, the output would be expected to span multiple files. In that case, it's not clear how the file[s] could be enclosed at all - maybe each file of the collection could have Enclosed format? Does the InputFormat have the info when an output file is started? Maybe a custom OutputFormat would be needed?

randallwhitman commented 3 years ago

maybe each file of the collection could have Enclosed format

I think that can be done by extending/implementing/overriding some-to-all of:

FileOutputFormat // HiveIgnoreKeyTextOutputFormat
- RecordWriter getHiveRecordWriter(JobConf jc, Path outPath, Class<? extends Writable> valueClass, boolean isCompressed, Properties tableProperties, Progressable progress) throws IOException
- RecordWriter<K, V> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException
- OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException
RecordWriter<K,V> // LineRecordWriter
- void close(TaskAttemptContext context) throws IOException, InterruptedException
- void write(K key, V value) throws IOException, InterruptedException

randallwhitman commented 3 years ago

See also - https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java The abstract base class appears introduced in Hive-0.11 , which is probably OK if we cease supporting older.

randallwhitman commented 3 years ago

com.esri.json.hadoop.Enclosed{Esri,Geo}JsonOutputFormat and/or com.esri.hadoop.hive.json.EnclosedEachJsonHiveOutputFormat

Esri / spatial-framework-for-hadoop

Output with Enclosed InputFormats #177