There we will see the dump of the internal schema of the table:
File Version: 0.12 with ORC_135
Rows: 62079
Compression: ZLIB
Compression size: 262144
Calendar: Julian/Gregorian
Type: struct<jobno:double,articlenumber:decimal(20,0),jobdesc:string,businessunit:string,quantityordered:decimal(7,0),quantitydelivered:decimal(7,0),status:string,working:string,pages:int,impressions:decimal(7,0),jobtypedesc:string,type:string,orderdate:string,requiredtime:string,lastopdesc:string,load_date:date>
[2].- Check the columns where the definition on ORC does not match the Hive definition, and alter these to match.
In the table example we have from the ORC that was provided, we should do in beeline:
alter table testorc change column articlenumber articlenumber decimal(20,0);
alter table testorc change column quantityordered quantityordered decimal(7,0);
alter table testorc change column quantitydelivered quantitydelivered decimal(7,0);
alter table testorc change column impressions impressions decimal(7,0);
With Hive 0.11.0 and 0.12.0, tables defined with a DECIMAL type don't have scale and precision defined.
For Hive 0.13.0 versions and above, default scale and precision values are applied in this scenario. The results will/may vary from previous versions.
This actually causes issues for Spark users using the metastore to get table definitions.
This is the solution to fix this for both Spark and Hive:
First we need to see which columns differ on their type. To find out execute
hive --orcfiledump /hdfsPathWhereTheORCFileOFTheTableSits/MyORCFile
There we will see the dump of the internal schema of the table: File Version: 0.12 with ORC_135 Rows: 62079 Compression: ZLIB Compression size: 262144 Calendar: Julian/Gregorian Type: struct<jobno:double,articlenumber:decimal(20,0),jobdesc:string,businessunit:string,quantityordered:decimal(7,0),quantitydelivered:decimal(7,0),status:string,working:string,pages:int,impressions:decimal(7,0),jobtypedesc:string,type:string,orderdate:string,requiredtime:string,lastopdesc:string,load_date:date>
[2].- Check the columns where the definition on ORC does not match the Hive definition, and alter these to match. In the table example we have from the ORC that was provided, we should do in beeline:
alter table testorc change column articlenumber articlenumber decimal(20,0); alter table testorc change column quantityordered quantityordered decimal(7,0); alter table testorc change column quantitydelivered quantitydelivered decimal(7,0); alter table testorc change column impressions impressions decimal(7,0);