I have a simple table with a type "date" field, when I call the cursor::getSchema() function an exception is thrown (KeyError 17). I think this only happens when a date field value is NULL.
Here is the table:
hive -e 'describe extended test_date'
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
col_name data_type comment
mydate date
_c1 bigint
In [1]: import pyhs2
In [2]: cnx = pyhs2.connect(eval("{'host':'', 'port':10000, 'authMechanism':'**', 'database':'default', 'user':'', 'password':'', }"))
In [3]: cur = cnx.cursor()
In [4]: query = 'select mydate from test_date'
In [5]: cur.execute(query)
In [6]: rows = cur.fetch()
In [7]: rows[:10]
Out[7]:
[[None],
['2013-12-31'],
['2014-01-05'],
['2014-01-10'],
['2014-01-15'],
['2014-01-20'],
['2014-01-25'],
['2014-01-30'],
['2014-02-04'],
['2014-02-09']]
In [8]: column_names = [a['columnName'] for a in cur.getSchema()]
KeyError Traceback (most recent call last)
in ()
----> 1 column_names = [a['columnName'] for a in cur.getSchema()]
/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.pyc in getSchema(self)
196 for c in self.client.GetResultSetMetadata(req).schema.columns:
197 col = {}
--> 198 col['type'] = get_type(c.typeDesc)
199 col['columnName'] = c.columnName
200 col['comment'] = c.comment
/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.pyc in get_type(typeDesc)
10 for ttype in typeDesc.types:
11 if ttype.primitiveEntry is not None:
---> 12 return TTypeId._VALUES_TO_NAMES[ttype.primitiveEntry.type]
13 elif ttype.mapEntry is not None:
14 return ttype.mapEntry
KeyError: 17
I've been using pyhs2 with a Hortonworks cluster.
I have a simple table with a type "date" field, when I call the cursor::getSchema() function an exception is thrown (KeyError 17). I think this only happens when a date field value is NULL.
Here is the table: hive -e 'describe extended test_date' Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK col_name data_type comment mydate date _c1 bigint
Detailed Table Information Table(tableName:test_date, dbName:default, owner:jprior, createTime:1422053397, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:mydate, type:date, comment:null), FieldSchema(name:_c1, type:bigint, comment:null)], location:hdfs://___.com:8020/apps/hive/warehouse/test_date, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=5, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1422053397, numRows=386, totalSize=6910, rawDataSize=6524}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 2.275 seconds, Fetched: 4 row(s)
Here is how to generate the exception:
In [1]: import pyhs2 In [2]: cnx = pyhs2.connect(eval("{'host':'', 'port':10000, 'authMechanism':'**', 'database':'default', 'user':'', 'password':'', }")) In [3]: cur = cnx.cursor() In [4]: query = 'select mydate from test_date' In [5]: cur.execute(query) In [6]: rows = cur.fetch() In [7]: rows[:10] Out[7]: [[None], ['2013-12-31'], ['2014-01-05'], ['2014-01-10'], ['2014-01-15'], ['2014-01-20'], ['2014-01-25'], ['2014-01-30'], ['2014-02-04'], ['2014-02-09']]
In [8]: column_names = [a['columnName'] for a in cur.getSchema()]
KeyError Traceback (most recent call last)