dask / hdfs3

A wrapper for libhdfs3 to interact with HDFS from Python
http://hdfs3.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
136 stars 40 forks source link

Kerberos Support #91

Closed pkasinathan closed 8 years ago

pkasinathan commented 8 years ago

Hi Team,

Does hdfs3 support kerberos? I tried to follow this instruction HDFileSystem(host=None, port=None, user=None, ticket_cache=None, token=None, pars=None, connect=True) to connect to kerberized hdfs name node, but it's not working.

Can you please give me some example or reference how to use hdfs3 to connect kerberized cluster?

Appreciate your support!

Thanks!

jettify commented 8 years ago

For kerberos instigation following should be done: 1) compile libgsasl with flag --with-gssapi-impl=mit, described here (https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3/issues/18) 2) compile libhdfs3 3) in HDFileSystem also pass pars = {"hadoop.security.authentication": "kerberos"} and appropriate ticket_cache (you can take it from klist command)

quasiben commented 8 years ago

We've use/used hdfs3 with kerberos. The testing setup can be found on the wiki: https://github.com/dask/hdfs3/wiki/Kerberos-Testing . Though, we have had some issues as noted here: https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3/issues/53. This happens hadoop.rpc.protection is set to privacy

pkasinathan commented 8 years ago

Thanks for the quick reply.

As per your suggestion, I installed libgsasl=1.8.0 using conda install -c anaconda libgsasl=1.8.0 command and it resolved my problem. I'm successfully able to access kerberized cluster using hdfs3 now.

You rock!

quasiben commented 8 years ago

Thanks @jettify for chiming in! Closing.

saurabh02 commented 5 years ago

@prabhu1984 I know it's been a while since this issue was raised, but thought I'd take a shot.Was seeking clarification regarding what settings exactly worked for you. Are the only things you did as follows?

-HDFileSystem(host=None, port=None, user=None, ticket_cache=None, token=None, pars=None, connect=True) -conda install -c anaconda libgsasl=1.8.0

Or did you do other things suggested by others too, such as compile libgsasl with flag --with-gssapi-impl=mit?

I'm facing issues connecting to a kerberized cluster as well using hdfs3. When I did exactly the two things shown above, my Dask job gets killed with the error: distributed.scheduler.KilledWorker: ('__call__-6af7aa29-2a09-45f3-a5e2-207c06562672', <Worker 'tcp://10.194.211.132:11927', memory: 0, processing: 1>)

pkasinathan commented 5 years ago

Thanks for getting back to me. It’s an old issue. We are able to use hdfs3 to connect kerberized cluster. This issue can be closed.

saurabh02 commented 5 years ago

@prabhu1984 I'm not a Dask developer, I'm just seeking your help on how you fixed the issue. Could you please share what setting exactly worked for you?