druid-io / pydruid

A Python connector for Druid
Other
506 stars 194 forks source link

Add kerberos auth #125

Open Dubrzr opened 6 years ago

Dubrzr commented 6 years ago

Some Druid are running with Kerberos enabled, that would be nice to have pydruid to work with these kerberized instances. I just saw that you are using the requests library to request the druid http api. And I also saw that there is a requests-kerberos library to add kerberos auth. Would it be possible to integrate it in pydruid?

The only requirement would be to add an argument in the requests calls, like this example:

import requests
from requests_kerberos import HTTPKerberosAuth, REQUIRED
kerberos_auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)
r = requests.get("https://windows.example.org/wsman", auth=kerberos_auth)
mistercrunch commented 6 years ago

If we had some sort of config.py and a way to overwrite variables declared in there (say in a druid_config.py), we could add an new config var called REQUESTS_AUTH which we'd then have to hook up in all requests calls.

Your local druid_config.py would simply set:

REQUESTS_AUTH = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)

I don't think there's a standard way of doing the local configuration discovery/overrides first. For me a requirement is that it should be as code (no .cfg or .ini please!) since configuration can often be best expressed with objects (as in your case an instance of HTTPKerberosAuth). It should have default values and allow users to override only the ones they want to change.

Here's how it's done in Superset: https://github.com/apache/incubator-superset/blob/master/superset/config.py#L412

config.py defines the default and patches itself with whatever it finds in superset_config.py.

Dubrzr commented 6 years ago

Ok I see :)

What do you think of just adding a "requests_auth=None" parameters to pydruid functions that call requests and pass this parameter to these subcalls to requests?

Something like:

import requests
from requests_kerberos import HTTPKerberosAuth, REQUIRED
kerberos_auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)

from pydruid.db import connect

conn = connect(host='localhost', port=8082, path='/druid/v2/sql/', scheme='http', 
requests_auth=kerberos_auth)
curs = conn.cursor()
curs.execute("""
    SELECT place,
           CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
           CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
      FROM places
     LIMIT 10
""")
for row in curs:
    print(row)

where connect would call requests like that:

def connect(..., requests_auth=None):
    ...
    requests.get(..., auth=requests_auth)
    ...

Also, I saw that you are also using urllib, why not use requests?

mistercrunch commented 6 years ago

Looks like you'll need many entry points and will have to carry many layers down the stack.

sanveera commented 5 years ago

Could you please someone post some sample code for Kerberos auth?

GuidoTournois commented 2 years ago

Hi all, it's been a while since there was any activity here. At Adyen we need to connect to our druid cluster with Kerberos authentication, so I was wondering what the status is. I am happy to contribute.