Open randallwhitman opened 3 years ago
With larger data, the output would be expected to span multiple files. In that case, it's not clear how the file[s] could be enclosed at all - maybe each file of the collection could have Enclosed format? Does the InputFormat have the info when an output file is started? Maybe a custom OutputFormat would be needed?
maybe each file of the collection could have Enclosed format
I think that can be done by extending/implementing/overriding some-to-all of:
FileOutputFormat
// HiveIgnoreKeyTextOutputFormat
RecordWriter getHiveRecordWriter(JobConf jc, Path outPath, Class<? extends Writable> valueClass, boolean isCompressed, Properties tableProperties, Progressable progress) throws IOException
RecordWriter<K, V> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException
OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException
RecordWriter<K,V>
// LineRecordWriter
void close(TaskAttemptContext context) throws IOException, InterruptedException
void write(K key, V value) throws IOException, InterruptedException
See also - https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java The abstract base class appears introduced in Hive-0.11 , which is probably OK if we cease supporting older.
com.esri.json.hadoop.Enclosed{Esri,Geo}JsonOutputFormat
and/or com.esri.hadoop.hive.json.EnclosedEachJsonHiveOutputFormat
See Esri/gis-tools-for-hadoop#83