Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
union type not work #488

Open irvinren opened 2 years ago

irvinren commented 2 years ago

hive union type is not working with this client.


CREATE TABLE all_types( tiny tinyint, small smallint, normal int, big bigint, flt float, dbl double, dp double, dcm decimal(10,5), dt date, str string, vrcr varchar(65535), chr char(255), bl boolean, bnr binary, arr_str array, mp map<string,string>, inner_str struct<col1:int,col2:string>, unn uniontype<int,string>) PARTITIONED BY ( ts timestamp) CLUSTERED BY ( big) INTO 128 BUCKETS ROW FORMAT SERDE '' STORED AS INPUTFORMAT '' OUTPUTFORMAT '' LOCATION 'hdfs://PhilMacBook:8020/Users/philren/IdeaProjects/spark/data/warehouse/all_types' TBLPROPERTIES ( 'transient_lastDdlTime'='1644917028')

conn = connect(host='localhost',port=10000,auth_mechanism='PLAIN') cursor = conn.cursor() cursor.execute('SELECT * FROM all_types') Traceback (most recent call last): File "", line 1, in File "/Users/philren/.local/share/virtualenvs/spark-examples--HrH57AW/lib/python3.6/site-packages/impala/", line 588, in next return File "/Users/philren/.local/share/virtualenvs/spark-examples--HrH57AW/lib/python3.6/site-packages/impala/", line 591, in next self._ensure_buffer_is_filled() File "/Users/philren/.local/share/virtualenvs/spark-examples--HrH57AW/lib/python3.6/site-packages/impala/", line 607, in _ensure_buffer_is_filled convert_types=self.convert_types) File "/Users/philren/.local/share/virtualenvs/spark-examples--HrH57AW/lib/python3.6/site-packages/impala/", line 1395, in fetch convert_types=convert_types) File "/Users/philren/.local/share/virtualenvs/spark-examples--HrH57AW/lib/python3.6/site-packages/impala/", line 1400, in _wrap_results return CBatch(results, expect_more_rows, schema, convert_types=convert_types) File "/Users/philren/.local/share/virtualenvs/spark-examples--HrH57AW/lib/python3.6/site-packages/impala/", line 950, in init for (i, col) in enumerate(trowset.columns)] File "/Users/philren/.local/share/virtualenvs/spark-examples--HrH57AW/lib/python3.6/site-packages/impala/", line 950, in for (i, col) in enumerate(trowset.columns)] KeyError: 'UNION' line 745 -> 765

_TTypeId_to_TColumnValue_getters = { 'BOOLEAN': operator.attrgetter('boolVal'), 'TINYINT': operator.attrgetter('byteVal'), 'SMALLINT': operator.attrgetter('i16Val'), 'INT': operator.attrgetter('i32Val'), 'BIGINT': operator.attrgetter('i64Val'), 'TIMESTAMP': operator.attrgetter('stringVal'), 'FLOAT': operator.attrgetter('doubleVal'), 'DOUBLE': operator.attrgetter('doubleVal'), 'STRING': operator.attrgetter('stringVal'), 'DECIMAL': operator.attrgetter('stringVal'), 'BINARY': operator.attrgetter('binaryVal'), 'VARCHAR': operator.attrgetter('stringVal'), 'CHAR': operator.attrgetter('stringVal'), 'MAP': operator.attrgetter('stringVal'), 'ARRAY': operator.attrgetter('stringVal'), 'STRUCT': operator.attrgetter('stringVal'), --> 'UNIONTYPE': operator.attrgetter('stringVal'), 'NULL': operator.attrgetter('stringVal'), 'DATE': operator.attrgetter('stringVal') }

shouldn't it be

'UNION': operator.attrgetter('stringVal')


csringhofer commented 2 years ago

Hi! Started looking at this issue and realized that UNIONTYPE is not fully supported even in Hive:

Meanwhile Impala does not support it at all - this also means that adding tests is a bit tricky right now, as nearly all tests use Impala and we only connect to Hive.

What is the aim of union type support? Is it for the sake of completeness, or someone is actually using it? It would be good to know whether there are people who are actually use this type in real life.