dask / hdfs3

A wrapper for libhdfs3 to interact with HDFS from Python
http://hdfs3.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
136 stars 40 forks source link

No file copy support despite appearing in the docs. #156

Closed RobertoRRW closed 5 years ago

RobertoRRW commented 6 years ago

Hello, in the documentation there's a mention of the cp command: hdfs.cp('/tmp/remote-file.txt', '/tmp/copied-file.txt')

There's even a failing test_copy entry in the tests. There's no cp command in the library though, nor is there anything functionally equivalent.

It would be nice if cp could be implemented, since it's such a fundamental operation.

martindurant commented 6 years ago

I don't know how this got missed - perhaps HDFS users usually depend on replication of the file rather than copy. In any case, libhdfs3 does have hdfsCopy, which is already speced in lib, so this is fairly easy to implement. Are you interested in putting in a PR?

RobertoRRW commented 6 years ago

Yeah, I can look into it over the weekend.

vincent-grosbois commented 5 years ago

Hello any news on this? it seems it is still missing

martindurant commented 5 years ago

@vincent-grosbois : As I've mentioned here and elsewhere, hdfs3 is not being developed any more, but this particular piece of functionality would be very easy to implement, if you have an interest. Neither arrow's hdfs interface nor webhdfs have a copy method, for some reason. (it is unclear to me if libhdfs, the jni library used by arrow, supports copy or not)

vincent-grosbois commented 5 years ago

thanks for the answer! I tried to implement it on my own but then I realized: https://github.com/Pivotal-DataFabric/attic-libhdfs3/blob/apache-rpc-9/src/client/Hdfs.cpp#L879

it seems that hdfsCopy is not implemented in the native libary! (i'm assuming this is this the "official" source code for the native implem)

martindurant commented 5 years ago

Hah, OK - so no one implemented it at the lower level. I wonder if even the java libs implement this. Sorry for not checking for you, I had assumed that if the function exists, it does something. Then, this issue should be closed (but I suppose this docs could be updated to reflect the real situation).