cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
727 stars 249 forks source link

HiveServer2Connection.cursor has always user=None when called from ImpalaExecutionContext #473

Open marqueewinq opened 2 years ago

marqueewinq commented 2 years ago

Steps to reproduce:

  1. Create impala + LDAP stack with docker-compose
    
    version: "3.5"

services: impala: image: codingtony/impala command: /start-bash.sh && /bin/true stdin_open: true tty: true privileged: true ports:

logging.basicConfig(level=logging.DEBUG) from impala.dbapi import connect from sqlalchemy import create_engine, inspect

host = "localhost" port = "21050" username = "admin" password = "admin" database = "default" use_ssl = False

auth_mechanism = "NOSASL" # or "LDAP"

auth_mechanism = "LDAP"

engine = create_engine( "impala://", connect_args={}, creator=lambda: connect( host=host, port=port, database=database, timeout=5, user=username, password=password, use_ssl=use_ssl, auth_mechanism=auth_mechanism, ), ) connection = engine.connect().execution_options(user=username) inspector = inspect(connection) table_names = inspector.get_table_names() print(table_names)

 3. Execute the script with `python3 connect.py 2> >(grep 'req=TOpenSessionReq')`

Expected output:

DEBUG:impala.hiveserver2:OpenSession: req=TOpenSessionReq(client_protocol=5, username='admin', password=None, configuration={}) []


Actual output (notice the user name):

DEBUG:impala.hiveserver2:OpenSession: req=TOpenSessionReq(client_protocol=5, username='marqueewinq', password=None, configuration={}) []

marqueewinq commented 2 years ago

With little pdb-ing i found that the cursor method of HiveServer2Connection does not receive the user argument from ImpalaExecutionContext.

I don't see the way to pass the user name from the script to the cursor method of HiveServer2Connection.

I'm not sure what would be the correct solution here; maybe read the user configuration from execution_options (configuration arg in cursor method)

marqueewinq commented 2 years ago

Monkey patch helps:

# connect.py
import logging

logging.basicConfig(level=logging.DEBUG)
from impala.dbapi import connect
from sqlalchemy import create_engine, inspect

from impala.sqlalchemy import ImpalaExecutionContext

def my_create_cursor(self):
    self._is_server_side = False
    cursor_configuration = self.execution_options.get("cursor_configuration", {})
    username = self.execution_options.get("user", None)
    return self._dbapi_connection.cursor(
        user=username, configuration=cursor_configuration
    )

ImpalaExecutionContext.create_cursor = my_create_cursor

host = "localhost"
port = "21050"
username = "admin"
password = "admin"
database = "default"
use_ssl = False
# auth_mechanism = "NOSASL" # or "LDAP"
auth_mechanism = "LDAP"

engine = create_engine(
    "impala://",
    connect_args={},
    creator=lambda: connect(
        host=host,
        port=port,
        database=database,
        timeout=5,
        user=username,
        password=password,
        use_ssl=use_ssl,
        auth_mechanism=auth_mechanism,
    ),
)
connection = engine.connect().execution_options(user=username)
inspector = inspect(connection)
table_names = inspector.get_table_names()
print(table_names)

Output:

$ python3 connect.py 2> >(grep 'req=TOpenSessionReq')
DEBUG:impala.hiveserver2:OpenSession: req=TOpenSessionReq(client_protocol=5, username='admin', password=None, configuration={})
[]
marqueewinq commented 2 years ago

I would happily create a PR with tests, but i need an advice from maintainers