Open ivanprado opened 11 years ago
In the case of map-output this is easy to specify in the Configuration and read by ThriftSerialization. In the case of sequence files containing Thrift objects either in key or value that couldn't be managed directly by SequenceFileInput/OutputFormat. New {Input/Output}Format must be created, and the protocol expected would be specified via Configuration or via SequenceFile Header. In this case ThriftSerialization couldn't be used since with no Objects wrappers a la Avro (AvroKey,AvroValue) it can't distinguish if its an input, map-output or output.
It sounds reasonable.
Iván
2013/1/8 Eric Palacios notifications@github.com
In the case of map-output this is easy to specify in the Configuration and read by ThriftSerialization. In the case of sequence files containing Thrift objects either in key or value that couldn't be managed directly by SequenceFileInput/OutputFormat. New {Input/Output}Format must be created, and the protocol expected would be specified via Configuration or via SequenceFile Header. In this case ThriftSerialization couldn't be used since with no Objects wrappers a la Avro (AvroKey,AvroValue) it can't distinguish if its an input, map-output or output.
— Reply to this email directly or view it on GitHubhttps://github.com/datasalt/pangool/issues/19#issuecomment-11992216.
Iván de Prado CEO & Co-founder www.datasalt.com
That would be solved properly by implementing a custom field serializer for Thrift (http://pangool.net/userguide/custom_serialization.html). The metadata would be used for storing the format used for serializing this field. This information would be carried as well in the header of the TupleFile.
Right now Pangool is serializing thrift using TBinaryProtocol. But could be interesting to use TCompactProtocol, which uses less space. The idea is to make the selection of the protocol configurable.