Open okyere opened 8 years ago
import sqlalchemy
def conn():
return connect(host='some_host',
port=21050,
database='default',
timeout=20,
use_ssl=True,
ca_cert='some_pem',
user=user, password=pwd,
auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('impala://', creator=conn)
you may or may not need some of those parameters
Does anyone know what is needed to connect to impala if we are using Kerberos?
Here are the steps that worked for me to install impyla/ibis in a kerberized environment (some of these are obvious)
Form a fresh conda environment I did the following:
1) Install Impyla (bitarray and thrift will install as dependencies 2) Install thrift_sasl 3) Install pure_sasl (I was unable to use sasl; but feel free to give it a shot) 4) Install numpy (Ibis dependency, if needed) 5) Install toolz (Ibis dependency, if needed) 6) Install hdfs 7) Install pandas 8) Install sqlalchemy 9) Install pykerberos 10) Install thriftpy
With that environment, and after a successful kinit, I was able to connect to impyla with the following:
import ibis
client = ibis.impala.connect(host='', port='', timeout='' use_ssl = True, auth_mechanism='GSSAPI', kerberos_service_name='')
import time
from sqlalchemy import create_engine, MetaData, Table, select, and_
ENGINE = create_engine(
'impala://{host}:{port}/{database}'.format(
host=host, # your host
port=port,
database=database,
)
)
METADATA = MetaData(ENGINE)
TABLES = {
'table': Table('table_name', METADATA, autoload=True),
}
@pyite1 Your solution is incomplete because Flask-SQLAlchemy does not support it yet. @okyere Have you resolved the problem in Flask-SQLAlchemy?
Flask-SQLAlchemy now supports the usage of an engine creator function. It can be passed using the engine_options
parameter when creating the SQLAlchemy
object. See docs: https://flask-sqlalchemy.palletsprojects.com/en/2.x/api/?highlight=engine_options
Here are the steps that worked for me to install impyla/ibis in a kerberized environment (some of these are obvious)
Form a fresh conda environment I did the following:
- Install Impyla (bitarray and thrift will install as dependencies
- Install thrift_sasl
- Install pure_sasl (I was unable to use sasl; but feel free to give it a shot)
- Install numpy (Ibis dependency, if needed)
- Install toolz (Ibis dependency, if needed)
- Install hdfs
- Install pandas
- Install sqlalchemy
- Install pykerberos
- Install thriftpy
With that environment, and after a successful kinit, I was able to connect to impyla with the following:
import ibis client = ibis.impala.connect(host='', port='', timeout='' use_ssl = True, auth_mechanism='GSSAPI', kerberos_service_name='')
Quick question, is your host and port for Impala? How do you find the parameter 'kerberos_service_name'?
import sqlalchemy def conn(): return connect(host='some_host', port=21050, database='default', timeout=20, use_ssl=True, ca_cert='some_pem', user=user, password=pwd, auth_mechanism='PLAIN') engine = sqlalchemy.create_engine('impala://', creator=conn)
you may or may not need some of those parameters
Sorry to recover such an old post, but nowadays, your suggestion raises an alert. Any suggestions to handle it?
_SADeprecationWarning: The dbapi() classmethod on dialect classes has been renamed to import_dbapi(). Implement an import_dbapi() classmethod directly on class <class 'impala.sqlalchemy.ImpalaDialect'> to remove this warning; the old .dbapi() classmethod may be maintained for backwards compatibility.
engine = sqlalchemy.create_engine('impala://', creator=conn)_
I'm new to hadoop and impala. After many days, I've managed to connect to using
from impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host="server.lrd..com",port=21050, database='tcad',auth_mechanism='PLAIN', user="alexcj", use_ssl=True,timeout=20, password="secret1pass")
cursor = conn.cursor()
cursor.execute('SELECT * FROM bom_2014_m LIMIT 10')
df = as_pandas(cursor)
Basically, I'm finally able to connect, query and create a dataframe from returned results.
Now, how do I connect to impala with sqlalchemy? I've seen
engine = create_engine('impala://localhost')
in the test files but my connection string has whole lot more parameters than the host. How do I pass all my parameters above tocreate_engine
? I'm designing a backend api with flask that will query from impala so this step is important. If you're familiar with flask-sqlalchemy, I'd like to know exactly what to pass to SQLALCHEMY_DATABASE_URI . If not, I'm willing to use sqlalchemy's declarative extension hence this question. Thanks.