dropbox / PyHive

Python interface to Hive and Presto. 🐝
Other
1.67k stars 553 forks source link

Cloud config connection timeout? #162

Open qiangchow opened 6 years ago

qiangchow commented 6 years ago

Hi,

when use hive.connect, could config timeout? cursor = hive.connect(host='xxx', port=xxx, database=xxx, auth='KERBEROS', kerberos_service_name=xxx).cursor() cursor.execute('SELECT * FROM xxx')

I didn't see the timeout parameter,thanks

`class Connection(object): """Wraps a Thrift session"""

def __init__(self, host=None, port=None, username=None, database='default', auth=None,
             configuration=None, kerberos_service_name=None, password=None,
             thrift_transport=None):
    """Connect to HiveServer2

    :param host: What host HiveServer2 runs on
    :param port: What port HiveServer2 runs on. Defaults to 10000.
    :param auth: The value of hive.server2.authentication used by HiveServer2.
        Defaults to ``NONE``.
    :param configuration: A dictionary of Hive settings (functionally same as the `set` command)
    :param kerberos_service_name: Use with auth='KERBEROS' only
    :param password: Use with auth='LDAP' only
    :param thrift_transport: A ``TTransportBase`` for custom advanced usage.
        Incompatible with host, port, auth, kerberos_service_name, and password.`

Traceback (most recent call last): File "/Users/xxx/Documents/dev/venv/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 104, in open handle.connect(sockaddr) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 60] Operation timed out

marklitle commented 6 years ago

same question. any samples of the configuration parameter?

Adamage commented 6 years ago

Can anyone detail how to pass timeout to connection? Is it a configuration dictionary element?

rcmgleite commented 5 years ago

+1

niyanchun commented 5 years ago

+1

Alovez commented 5 years ago

+1

fgimian commented 5 years ago

Sadly it seems that PyHive doesn't provide this. You'll see that the socket is created here

socket = thrift.transport.TSocket.TSocket(host, port)

One may then call the following TSocket method to set the timeout:

socket.setTimeout(timeout_ms)

In my case, I am using PLAIN authentication, so I just implemented a little function like so:

import sasl
from thrift_sasl import TSaslClientTransport
from thrift.transport.TSocket import TSocket

def create_hive_plain_transport(host, port, username, password, timeout=60):
    socket = TSocket(host, port)
    socket.setTimeout(timeout * 1000)

    sasl_auth = 'PLAIN'

    def sasl_factory():
        sasl_client = sasl.Client()
        sasl_client.setAttr('host', host)
        sasl_client.setAttr('username', username)
        sasl_client.setAttr('password', password)
        sasl_client.init()
        return sasl_client

    return TSaslClientTransport(sasl_factory, sasl_auth, socket)

And now, when running connect, I use this function to create the thrift transport:

hive.connect(
    thrift_transport=create_hive_plain_transport(
        host='bla',
        port=10000,
        username='me',
        password='password',
        timeout=120
    ),
    database='bla'
)

See the following code in PyHive for inspiration (as I did) :smile:

I noticed this approach from the pyhs2 Connection constructor.

Hope this helps someone :smile: Fotis

wilberh commented 4 years ago

Any plans to add a timeout param to hive.connect ?

AmineBenami commented 3 years ago

Have you any new insights about this little config?

darrkz commented 2 years ago

I have try to change and add the timeout argument and value, but it failed....