apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.52k stars 2.25k forks source link

Partition Filter returns incorrect results for decimal partition columns with trailing 0's #7882

Closed JakeMcPherson25 closed 2 months ago

JakeMcPherson25 commented 1 year ago

Apache Iceberg version

1.3.0 (latest release)

Query engine

Hive

Please describe the bug 🐞

After creating a iceberg table with a decimal identity partition column like such decimal_col(5,4). Applying predicates to the table using HiveIcebergInputFormat.getSplits method manipulates the hiveFilter and converts it to an Iceberg Expression

 String hiveFilter = job.get(TableScanDesc.FILTER_EXPR_CONF_STR);
    if (hiveFilter != null) {
      ExprNodeGenericFuncDesc exprNodeDesc =
          SerializationUtilities.deserializeObject(hiveFilter, ExprNodeGenericFuncDesc.class);
      SearchArgument sarg = ConvertAstToSearchArg.create(job, exprNodeDesc);
      try {
        Expression filter = HiveIcebergFilterFactory.generateFilterExpression(sarg);

However the issue is when Hive generates this expression decimal columns will be trimmed of trailing zero's ex) 1.1000 -> 1.1 with a scale of 1. This is due to how hive creates a HiveDecimal and eliminates the tailing zero's here: https://github.com/apache/hive/blob/d31086c0a74b8bb48db774379ce6b7ab7d9233ff/storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java#L5955

This causes partition eliminating to fail for the in predicates since the in predicate is a strict match in the Evaluator class here: https://github.com/apache/iceberg/blob/97d00f9eb3d9ea65e19069794dae2d1d1634d279/api/src/main/java/org/apache/iceberg/expressions/Evaluator.java#L141

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] commented 2 months ago

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'