dstreev / cloudera_upgrade_utils

Various tools to help plan HDP and CDH upgrades to CDP
14 stars 11 forks source link

Location and Report on Hive DECIMAL issues #8

Closed dstreev closed 3 years ago

dstreev commented 3 years ago

With Hive 0.11.0 and 0.12.0, tables defined with a DECIMAL type don't have scale and precision defined.

For Hive 0.13.0 versions and above, default scale and precision values are applied in this scenario. The results will/may vary from previous versions.

This actually causes issues for Spark users using the metastore to get table definitions.

This is the solution to fix this for both Spark and Hive:

First we need to see which columns differ on their type. To find out execute

hive --orcfiledump /hdfsPathWhereTheORCFileOFTheTableSits/MyORCFile

There we will see the dump of the internal schema of the table: File Version: 0.12 with ORC_135 Rows: 62079 Compression: ZLIB Compression size: 262144 Calendar: Julian/Gregorian Type: struct<jobno:double,articlenumber:decimal(20,0),jobdesc:string,businessunit:string,quantityordered:decimal(7,0),quantitydelivered:decimal(7,0),status:string,working:string,pages:int,impressions:decimal(7,0),jobtypedesc:string,type:string,orderdate:string,requiredtime:string,lastopdesc:string,load_date:date>

[2].- Check the columns where the definition on ORC does not match the Hive definition, and alter these to match. In the table example we have from the ORC that was provided, we should do in beeline:

alter table testorc change column articlenumber articlenumber decimal(20,0); alter table testorc change column quantityordered quantityordered decimal(7,0); alter table testorc change column quantitydelivered quantitydelivered decimal(7,0); alter table testorc change column impressions impressions decimal(7,0);

dstreev commented 3 years ago

reference case: 746002