BradRuderman / pyhs2

MIT License
208 stars 107 forks source link

cur.getSchema() when <date> fields exist throws exception (when field is NULL)? #35

Open BAM-BAM-BAM opened 9 years ago

BAM-BAM-BAM commented 9 years ago

I've been using pyhs2 with a Hortonworks cluster.

I have a simple table with a type "date" field, when I call the cursor::getSchema() function an exception is thrown (KeyError 17). I think this only happens when a date field value is NULL.

Here is the table: hive -e 'describe extended test_date' Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK col_name data_type comment mydate date _c1 bigint

Detailed Table Information Table(tableName:test_date, dbName:default, owner:jprior, createTime:1422053397, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:mydate, type:date, comment:null), FieldSchema(name:_c1, type:bigint, comment:null)], location:hdfs://___.com:8020/apps/hive/warehouse/test_date, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=5, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1422053397, numRows=386, totalSize=6910, rawDataSize=6524}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 2.275 seconds, Fetched: 4 row(s)

Here is how to generate the exception:

ipython

In [1]: import pyhs2 In [2]: cnx = pyhs2.connect(eval("{'host':'', 'port':10000, 'authMechanism':'**', 'database':'default', 'user':'', 'password':'', }")) In [3]: cur = cnx.cursor() In [4]: query = 'select mydate from test_date' In [5]: cur.execute(query) In [6]: rows = cur.fetch() In [7]: rows[:10] Out[7]: [[None], ['2013-12-31'], ['2014-01-05'], ['2014-01-10'], ['2014-01-15'], ['2014-01-20'], ['2014-01-25'], ['2014-01-30'], ['2014-02-04'], ['2014-02-09']]

In [8]: column_names = [a['columnName'] for a in cur.getSchema()]

KeyError Traceback (most recent call last)

in () ----> 1 column_names = [a['columnName'] for a in cur.getSchema()] /edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.pyc in getSchema(self) 196 for c in self.client.GetResultSetMetadata(req).schema.columns: 197 col = {} --> 198 col['type'] = get_type(c.typeDesc) 199 col['columnName'] = c.columnName 200 col['comment'] = c.comment /edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.pyc in get_type(typeDesc) 10 for ttype in typeDesc.types: 11 if ttype.primitiveEntry is not None: ---> 12 return TTypeId._VALUES_TO_NAMES[ttype.primitiveEntry.type] 13 elif ttype.mapEntry is not None: 14 return ttype.mapEntry KeyError: 17
BradRuderman commented 9 years ago

what version of hive?

BAM-BAM-BAM commented 9 years ago

Hive 0.13.0.2.1.4.0-632