bwhite / hadoopy

Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
GNU General Public License v3.0
243 stars 59 forks source link

Add the rest of the HDFS manipulation commands (rm, mv, put, etc) #8

Closed bwhite closed 13 years ago

bwhite commented 13 years ago

I find ls and cat to be the most useful (and safe!); however, it makes sense to extend this to all hdfs commands.

dgleich commented 13 years ago

See https://github.com/dgleich/hadoopy master branch for a quick take on the exists and rm commands. These were semi-ported from dumbo. Also, my hadoop fs command likes to spew INFO and WARN lines, which was causing ls() to fail. So I switched the error check on it.

bwhite commented 13 years ago

Is the ls failing for you on my master branch too? amiller and I made some changes that fixed that problem for him. We have to be careful about license differences when porting code as this project is GPLv3 and Dumbo is apache, I'll take a look at your branch later tonight.

dgleich commented 13 years ago

I used the release-0.3.0 branch as the master branch doesn't have the glibc detection code in it. That change fixed ls for me. The code from dumbo was really just the hadoop fs commands and the os.system call on python 2.4.

bwhite commented 13 years ago

I'd like to standardize the form of this so that ls and cat use the same setup and variable names. One problem is that env={} fails under some configurations. What I have been doing is if the hadoop command can't be called in one way, then I try the other way.