Closed Kai-Chen closed 6 years ago
It would be great if this makes it easier to start writing ORC and Parquet
Jacob
On Dec 19, 2017, 17:04 -0500, Kai Chen notifications@github.com, wrote:
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
ORC/Parquet are outside the scope of this ticket.
We’ll talk about this offline, but given how standard those formats are, I wanted to put flexibility on the mind of the ticket implementor. But I guess out of scope for this (vaguely written) ticket...
Jacob
On Dec 19, 2017, 19:28 -0500, Ali Tajeldin notifications@github.com, wrote:
ORC/Parquet are outside the scope of this ticket. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
@kmannislands After the PR is merged, see testHdfsConn.py
for example of use.
The entry point from Python is SmvApp.copyToHdfs
.
The only expectation is that the file object from the upload is opened in binary read mode. Let me know if this is not the case, and we will work on that.
Cheers!
@jacobdr ... since my mind
is mentioned :)
A stream of bytes is probably as flexible as you can get with any api, so you can certainly upload an ORC or a Parquet file.
But conversion is a different matter -- not the kind of matter that I'd mind, though :)
-- Sorry about the bad puns ... gotta get that out of my system ... it's holidays.
Need to be usable by Python, so may involve conversion from Python stream to Java stream.