dask / knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
http://knit.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
53 stars 10 forks source link

hdfs_home no longer respected #105

Closed jcrist closed 7 years ago

jcrist commented 7 years ago

101 broke support for alternate hdfs_home locations. When starting an application with alternative hdfs_home, the file is uploaded to the correct location, but the download address is incorrect.

knit = knit.Knit(hdfs_home='/tmp/knit')
knit.start('ls', files=['hdfs://path/to/my/file.zip'])
# Container starts, but fails to download file

In the logs I see java exceptions like:

java.io.FileNotFoundException: File does not exist: hdfs://HOSTNAME:8020/user/yarn/.knitDeps/file.zip

Note that while HOSTNAME is a stand-in for the actual address, the rest of it is the actual error. It looks like the code is pulling the basename the passed in path off the address, and using the relative path .knitDeps/basename for all files.

Almost certainly related to #104.