Open alexwoolford opened 8 years ago
It looks like PyHive connects to Hive using a different protocol:
https://github.com/dropbox/PyHive/blob/master/pyhive/hive.py#L87
Can you try changing V6 to V1 and see if that fixes your issue? If this works I can add a configuration option. I can also try to repro on my side.
Thanks Wes. It's an honor!
I changed the client from V6 to V1 and it still hangs. I can share my screen via Google Hangout if that's helpful.
It would be nice to understand this issue if you do figure out -- unfortunately I don't have the bandwidth to help debug it further. If you are able to sort it out (and if it is an impyla bug) please let me know the resolution here. cc @mjacobs
@alexwoolford @wesm I was facing the same issue. Changing the auth_mechanism to 'PLAIN' worked for me.
Here's the documentation in the code:
auth_mechanism : {'NOSASL', 'PLAIN', 'GSSAPI', 'LDAP'} Specify the authentication mechanism.
'NOSASL'for unsecured Impala.
'PLAIN'for unsecured Hive (because Hive requires the SASL transport).
'GSSAPI'for Kerberos and
'LDAP'for Kerberos with LDAP.
The default used is NOSASL:
def connect(host='localhost', port=21050, database=None, timeout=None, use_ssl=False, ca_cert=None, auth_mechanism='NOSASL', user=None, password=None, kerberos_service_name='impala', use_ldap=None, ldap_user=None, ldap_password=None, use_kerberos=None, protocol=None):
@amanjunath thanks man.
Below worked for me. Just adding auth_mechanism='PLAIN'
:
conn = connect(host='myhost', port=10000, auth_mechanism='PLAIN', user='myuser', password=None)
not support ‘PLAIN’ TTransportException: Bad status: 3 (Unsupported mechanism type PLAIN)
@wesm
Where my code hangs is
results = cursor.fetchmany() <--
and it happen only occasionally and when I rerun my code, it goes ok...
@wesm I've been running a load test on our cluster of 192 processes by 14 queries. I see that my machine establishes 192 connections to the cluster using netstat but via logging, I see that I only receive 180 cursors. The remaining 12 cursors hang for over 1 hour until they return a cursor object from the connection. Adjusting the impala maximum connections did not help. I am using a single connection and cursor to execute all 14 queries with a sleep in between each query to simulate concurrent user action.
Do you have any thoughts on this?
Thanks Joe
Hang occurs @ TOpenSessionReq
Attempting to open transport (tries_left=2)
Transport opened
Establishing Connection
Connecting to HiveServer2 hostname:25003 with PLAIN authentication mechanism
get_socket: host=hostname port=25003 use_ssl=False ca_cert=None
sock=<thrift.transport.TSocket.TSocket instance at 0x7f765fea0aa0>
get_transport: socket=<thrift.transport.TSocket.TSocket instance at 0x7f765fea0aa0> host=hostname kerberos_service_name=impala auth_mechanism=PLAIN user=userpassword=fuggetaboutit
transport=<thrift_sasl.TSaslClientTransport instance at 0x7f765fea0e60> protocol=<thrift.protocol.TBinaryProtocol.TBinaryProtocolAccelerated instance at 0x7f765fea7140> service=<impala._thrift_gen.ImpalaService.ImpalaHiveServer2Service.Client object at 0x7f765fe9dd50>
HiveServer2Connection(service=<impala.hiveserver2.HS2Service object at 0x7f765fe9dd90>, default_db=co5012_cpi_int)
Connection Established
Acquiring Cursor
Getting a cursor (Impala session)
.cursor(): getting new session_handle
OpenSession: req=TOpenSessionReq(username='root', password=None, client_protocol=5, configuration=None)
Attempting to open transport (tries_left=3)
Transport opened
Here' s more info from another process - Notice the 1 hour and 40 minute hang
`
Apr 1, 2020 @ 19:57:56.000 | Transport opened |
---|---|
Apr 1, 2020 @ 19:57:56.000 | OpenSession: resp=TOpenSessionResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='\xaa\n\xfb=y\xdaI\t\x98u\r6>Ss0', guid='.>\t\x12\xff\xd9G\xbb\x80\x01U\x05a\x87\xad\x89')), configuration={'RUNTIME_BLOOM_FILTER_SIZE': '1048576', 'DECIMAL_V2': '0', 'QUERY_TIMEOUT_S': '0', 'RM_INITIAL_MEM': '0', 'HBASE_CACHE_BLOCKS': '0', 'SCHEDULE_RANDOM_REPLICA': '0', 'DEFAULT_ORDER_BY_LIMIT': '-1', 'REPLICA_PREFERENCE': '0', 'RUNTIME_FILTER_MODE': '2', 'HBASE_CACHING': '0', 'DISABLE_CODEGEN': '0', 'S3_SKIP_INSERT_STAGING': '1', 'ABORT_ON_ERROR': '0', 'PREFETCH_MODE': '1', 'MAX_ROW_SIZE': '524288', 'PARQUET_FILE_SIZE': '0', 'DISABLE_STREAMING_PREAGGREGATIONS': '0', 'SHUFFLE_DISTINCT_EXPRS': '1', 'COMPRESSION_CODEC': '', 'DISABLE_ROW_RUNTIME_FILTERING': '0', 'DISABLE_OUTERMOST_TOPN': '0', 'MIN_SPILLABLE_BUFFER_SIZE': '65536', 'PARQUET_READ_STATISTICS': '1', 'http_addr': 'c3kuhdpnode7.premierinc.com:25000', 'STRICT_MODE': '0', 'BATCH_SIZE': '0', 'MAX_SCAN_RANGE_LENGTH': '0', 'DISABLE_CODEGEN_ROWS_THRESHOLD': '50000', 'MT_DOP': '', 'RUNTIME_FILTER_MIN_SIZE': '1048576', 'PARQUET_FALLBACK_SCHEMA_RESOLUTION': '0', 'NUM_NODES': '0', 'ALLOW_UNSUPPORTED_FORMATS': '0', 'EXEC_TIME_LIMIT_S': '0', 'RUNTIME_FILTER_MAX_SIZE': '16777216', 'NUM_SCANNER_THREADS': '0', 'SCAN_NODE_CODEGEN_THRESHOLD': '1800000', 'MAX_NUM_RUNTIME_FILTERS': '10', 'EXPLAIN_LEVEL': '1', 'SCRATCH_LIMIT': '-1', 'SEQ_COMPRESSION_MODE': '', 'MAX_ERRORS': '100', 'MAX_IO_BUFFERS': '0', 'COMPUTE_STATS_MIN_SAMPLE_SIZE': '1073741824', 'RUNTIME_FILTER_WAIT_TIME_MS': '0', 'PARQUET_DICTIONARY_FILTERING': '1', 'DEBUG_ACTION': '', 'PARQUET_ARRAY_RESOLUTION': '2', 'DISABLE_UNSAFE_SPILLS': '0', 'MAX_MEM_ESTIMATE_FOR_ADMISSION': '0', 'BUFFER_POOL_LIMIT': '', 'RESERVATION_REQUEST_TIMEOUT': '', 'SYNC_DDL': '0', 'PARQUET_ANNOTATE_STRINGS_UTF8': '0', 'DISABLE_CACHED_READS': '0', 'ABORT_ON_DEFAULT_LIMIT_EXCEEDED': '0', 'APPX_COUNT_DISTINCT': '0', 'V_CPU_CORES': '', 'MEM_LIMIT': '0', 'OPTIMIZE_PARTITION_KEY_SCANS': '0', 'DEFAULT_JOIN_DISTRIBUTION_MODE': '0', 'DEFAULT_SPILLABLE_BUFFER_SIZE': '2097152', 'REQUEST_POOL': '', 'IDLE_SESSION_TIMEOUT': '0', 'EXEC_SINGLE_NODE_ROWS_THRESHOLD': '100', 'ENABLE_EXPR_REWRITES': '0'}, serverProtocolVersion=5) |
Apr 1, 2020 @ 19:57:56.000 | HiveServer2Cursor(service=<impala.hiveserver2.HS2Service object at 0x7f054b8bed90>, session_handle=TSessionHandle(sessionId=THandleIdentifier(secret='\xaa\n\xfb=y\xdaI\t\x98u\r6>Ss0', guid='.>\t\x12\xff\xd9G\xbb\x80\x01U\x05a\x87\xad\x89')), default_config={'RUNTIME_BLOOM_FILTER_SIZE': '1048576', 'DECIMAL_V2': '0', 'QUERY_TIMEOUT_S': '0', 'RM_INITIAL_MEM': '0', 'HBASE_CACHE_BLOCKS': '0', 'SCHEDULE_RANDOM_REPLICA': '0', 'DEFAULT_ORDER_BY_LIMIT': '-1', 'REPLICA_PREFERENCE': '0', 'RUNTIME_FILTER_MODE': '2', 'HBASE_CACHING': '0', 'DISABLE_CODEGEN': '0', 'S3_SKIP_INSERT_STAGING': '1', 'ABORT_ON_ERROR': '0', 'PREFETCH_MODE': '1', 'MAX_ROW_SIZE': '524288', 'PARQUET_FILE_SIZE': '0', 'DISABLE_STREAMING_PREAGGREGATIONS': '0', 'SHUFFLE_DISTINCT_EXPRS': '1', 'COMPRESSION_CODEC': '', 'DISABLE_ROW_RUNTIME_FILTERING': '0', 'DISABLE_OUTERMOST_TOPN': '0', 'MIN_SPILLABLE_BUFFER_SIZE': '65536', 'PARQUET_READ_STATISTICS': '1', 'http_addr': 'c3kuhdpnode7.premierinc.com:25000', 'STRICT_MODE': '0', 'BATCH_SIZE': '0', 'MAX_SCAN_RANGE_LENGTH': '0', 'DISABLE_CODEGEN_ROWS_THRESHOLD': '50000', 'MT_DOP': '', 'RUNTIME_FILTER_MIN_SIZE': '1048576', 'PARQUET_FALLBACK_SCHEMA_RESOLUTION': '0', 'NUM_NODES': '0', 'ALLOW_UNSUPPORTED_FORMATS': '0', 'EXEC_TIME_LIMIT_S': '0', 'RUNTIME_FILTER_MAX_SIZE': '16777216', 'NUM_SCANNER_THREADS': '0', 'SCAN_NODE_CODEGEN_THRESHOLD': '1800000', 'MAX_NUM_RUNTIME_FILTERS': '10', 'EXPLAIN_LEVEL': '1', 'SCRATCH_LIMIT': '-1', 'SEQ_COMPRESSION_MODE': '', 'MAX_ERRORS': '100', 'MAX_IO_BUFFERS': '0', 'COMPUTE_STATS_MIN_SAMPLE_SIZE': '1073741824', 'RUNTIME_FILTER_WAIT_TIME_MS': '0', 'PARQUET_DICTIONARY_FILTERING': '1', 'DEBUG_ACTION': '', 'PARQUET_ARRAY_RESOLUTION': '2', 'DISABLE_UNSAFE_SPILLS': '0', 'MAX_MEM_ESTIMATE_FOR_ADMISSION': '0', 'BUFFER_POOL_LIMIT': '', 'RESERVATION_REQUEST_TIMEOUT': '', 'SYNC_DDL': '0', 'PARQUET_ANNOTATE_STRINGS_UTF8': '0', 'DISABLE_CACHED_READS': '0', 'ABORT_ON_DEFAULT_LIMIT_EXCEEDED': '0', 'APPX_COUNT_DISTINCT': '0', 'V_CPU_CORES': '', 'MEM_LIMIT': '0', 'OPTIMIZE_PARTITION_KEY_SCANS': '0', 'DEFAULT_JOIN_DISTRIBUTION_MODE': '0', 'DEFAULT_SPILLABLE_BUFFER_SIZE': '2097152', 'REQUEST_POOL': '', 'IDLE_SESSION_TIMEOUT': '0', 'EXEC_SINGLE_NODE_ROWS_THRESHOLD': '100', 'ENABLE_EXPR_REWRITES': '0'}, hs2_protocol_version=5) |
Apr 1, 2020 @ 19:57:56.000 | Executing query USE database |
Apr 1, 2020 @ 19:57:56.000 | _execute_async: self._buffer=Batch() self._description=None self._last_operation_active=False self._last_operation=None |
Apr 1, 2020 @ 19:57:56.000 | _reset_state: Resetting cursor state |
Apr 1, 2020 @ 19:57:56.000 | _execute_async: self._buffer=Batch() self._description=None self._last_operation_active=False self._last_operation=None |
Apr 1, 2020 @ 19:57:56.000 | ExecuteStatement: req=TExecuteStatementReq(confOverlay=None, sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='\xaa\n\xfb=y\xdaI\t\x98u\r6>Ss0', guid='.>\t\x12\xff\xd9G\xbb\x80\x01U\x05a\x87\xad\x89')), runAsync=True, statement='USE az2090_username') |
Apr 1, 2020 @ 19:57:56.000 | Attempting to open transport (tries_left=3) |
Apr 1, 2020 @ 19:57:56.000 | Transport opened |
Apr 1, 2020 @ 18:17:56.000 | Connecting to HiveServer2 hostname:25003 with PLAIN authentication mechanism |
Apr 1, 2020 @ 18:17:56.000 | get_socket: host=hostname port=25003 use_ssl=False ca_cert=None |
Apr 1, 2020 @ 18:17:56.000 | sock=<thrift.transport.TSocket.TSocket instance at 0x7f054b8c1aa0> |
Apr 1, 2020 @ 18:17:56.000 | get_transport: socket=<thrift.transport.TSocket.TSocket instance at 0x7f054b8c1aa0> host=hostname kerberos_service_name=impala auth_mechanism=PLAIN user=cpiint_np password=fuggetaboutit |
Apr 1, 2020 @ 18:17:56.000 | transport=<thrift_sasl.TSaslClientTransport instance at 0x7f054b8c1e60> protocol=<thrift.protocol.TBinaryProtocol.TBinaryProtocolAccelerated instance at 0x7f054b8c8140> service=<impala._thrift_gen.ImpalaService.ImpalaHiveServer2Service.Client object at 0x7f054b8bed50> |
Apr 1, 2020 @ 18:17:56.000 | HiveServer2Connection(service=<impala.hiveserver2.HS2Service object at 0x7f054b8bed90>, default_db=az2090_username) |
Apr 1, 2020 @ 18:17:56.000 | Getting a cursor (Impala session) |
Apr 1, 2020 @ 18:17:56.000 | .cursor(): getting new session_handle |
Apr 1, 2020 @ 18:17:56.000 | OpenSession: req=TOpenSessionReq(username='root', password=None, client_protocol=5, configuration=None) |
Apr 1, 2020 @ 18:17:56.000 | Attempting to open transport (tries_left=3) |
Apr 1, 2020 @ 18:17:56.000 | Transport opened |
`
Exactly a year later, still getting this issue. It seems to hang consistently with certain queries (which only return ~200 rows tops) which are near-instant using Database query tools such as DBeaver. Other queries work fine, even when they are more complicated and return more records.
Occurs on the fetchall()
call inside as_pandas
. Happens even if I use a raw_connection
and call fetchall()
myself. Left it overnight for 8 hours and it was still hanging in the morning. Any help would be appreciated.
I have auth_mechanism='PLAIN'
set, and I have also tried it with create_engine
in SQLAlchemy, but to no avail, because internally they all use fetchall()
. Using PyHive is not an option because unfortunately I'm in a corporate environment on Windows and cannot install any of the libraries, so I can only use Impyla.
Did you ever fix this @darkz1984 ?
EDIT: I seem to have been able to prevent the hang, but at the cost of performance. If I set the cursor's buffersize to 1:
cursor._buffersize = 1
It generally slows down all queries because of the smaller buffer, but it stops hanging. Bear in mind we are dealing with very small queries here (~200 records returned at most), so I'm not sure how this will affect larger queries.
@DVAlexHiggs Hi Alex, I also faced this issue. When I do fetchall() keep stopping without end or error.
I wonder if you solved this problem. Thanks!
When I try to run the following code, the client hangs when trying to connect to Hive:
I was able to connect to HiveServer2, via a Java client, and so it seems that the connectivity issue is Python/Impyla specific. When I debug/step-into, the code hangs at line 873 of hiveserver2.py.
Hive 1.1.0-cdh5.5.1 Python 2.7.11 | Anaconda 2.3.0 Impyla installed by pip today (2016-03-07).
Do you have any thoughts/suggestions?