cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
731 stars 248 forks source link

impala connection via sqlalchemy #214

Open okyere opened 8 years ago

okyere commented 8 years ago

I'm new to hadoop and impala. After many days, I've managed to connect to using

from impala.dbapi import connect from impala.util import as_pandas conn = connect(host="server.lrd..com",port=21050, database='tcad',auth_mechanism='PLAIN', user="alexcj", use_ssl=True,timeout=20, password="secret1pass")

cursor = conn.cursor() cursor.execute('SELECT * FROM bom_2014_m LIMIT 10') df = as_pandas(cursor)

Basically, I'm finally able to connect, query and create a dataframe from returned results.

Now, how do I connect to impala with sqlalchemy? I've seen

engine = create_engine('impala://localhost') in the test files but my connection string has whole lot more parameters than the host. How do I pass all my parameters above to create_engine? I'm designing a backend api with flask that will query from impala so this step is important. If you're familiar with flask-sqlalchemy, I'd like to know exactly what to pass to SQLALCHEMY_DATABASE_URI . If not, I'm willing to use sqlalchemy's declarative extension hence this question. Thanks.

pyite1 commented 8 years ago
import sqlalchemy
def conn():
    return connect(host='some_host', 
                             port=21050,
                             database='default',
                             timeout=20,
                             use_ssl=True,
                             ca_cert='some_pem',
                             user=user, password=pwd,
                             auth_mechanism='PLAIN')

engine = sqlalchemy.create_engine('impala://', creator=conn)

you may or may not need some of those parameters

jrburris commented 7 years ago

Does anyone know what is needed to connect to impala if we are using Kerberos?

pyite1 commented 7 years ago

Here are the steps that worked for me to install impyla/ibis in a kerberized environment (some of these are obvious)

Form a fresh conda environment I did the following:

1) Install Impyla (bitarray and thrift will install as dependencies 2) Install thrift_sasl 3) Install pure_sasl (I was unable to use sasl; but feel free to give it a shot) 4) Install numpy (Ibis dependency, if needed) 5) Install toolz (Ibis dependency, if needed) 6) Install hdfs 7) Install pandas 8) Install sqlalchemy 9) Install pykerberos 10) Install thriftpy

With that environment, and after a successful kinit, I was able to connect to impyla with the following:

import ibis
client = ibis.impala.connect(host='', port='', timeout='' use_ssl = True, auth_mechanism='GSSAPI', kerberos_service_name='')
PegasusWang commented 7 years ago
import time

from sqlalchemy import create_engine, MetaData, Table, select, and_

ENGINE = create_engine(
    'impala://{host}:{port}/{database}'.format(
        host=host,    # your host
        port=port,
        database=database,
    )
)
METADATA = MetaData(ENGINE)
TABLES = {
    'table': Table('table_name', METADATA, autoload=True),

}
yiakwy commented 6 years ago

@pyite1 Your solution is incomplete because Flask-SQLAlchemy does not support it yet. @okyere Have you resolved the problem in Flask-SQLAlchemy?

zslim commented 3 years ago

Flask-SQLAlchemy now supports the usage of an engine creator function. It can be passed using the engine_options parameter when creating the SQLAlchemy object. See docs: https://flask-sqlalchemy.palletsprojects.com/en/2.x/api/?highlight=engine_options

matteoaurelio commented 2 years ago

Here are the steps that worked for me to install impyla/ibis in a kerberized environment (some of these are obvious)

Form a fresh conda environment I did the following:

  1. Install Impyla (bitarray and thrift will install as dependencies
  2. Install thrift_sasl
  3. Install pure_sasl (I was unable to use sasl; but feel free to give it a shot)
  4. Install numpy (Ibis dependency, if needed)
  5. Install toolz (Ibis dependency, if needed)
  6. Install hdfs
  7. Install pandas
  8. Install sqlalchemy
  9. Install pykerberos
  10. Install thriftpy

With that environment, and after a successful kinit, I was able to connect to impyla with the following:

import ibis
client = ibis.impala.connect(host='', port='', timeout='' use_ssl = True, auth_mechanism='GSSAPI', kerberos_service_name='')

Quick question, is your host and port for Impala? How do you find the parameter 'kerberos_service_name'?

frbelotto commented 1 month ago
import sqlalchemy
def conn():
    return connect(host='some_host', 
                             port=21050,
                             database='default',
                             timeout=20,
                             use_ssl=True,
                             ca_cert='some_pem',
                             user=user, password=pwd,
                             auth_mechanism='PLAIN')

engine = sqlalchemy.create_engine('impala://', creator=conn)

you may or may not need some of those parameters

Sorry to recover such an old post, but nowadays, your suggestion raises an alert. Any suggestions to handle it?

_SADeprecationWarning: The dbapi() classmethod on dialect classes has been renamed to import_dbapi().  Implement an import_dbapi() classmethod directly on class <class 'impala.sqlalchemy.ImpalaDialect'> to remove this warning; the old .dbapi() classmethod may be maintained for backwards compatibility.
  engine = sqlalchemy.create_engine('impala://', creator=conn)_