apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[MINOR] Fix NPE while parsing partition values in file index #12177

Closed codope closed 3 weeks ago

codope commented 3 weeks ago

Change Logs

For timestamp keygen, timestamp type was recently added to the table config. In case this property is not present in table config, we still need to be backward compatible and fallback to the usual way of parsing partition values. Otherwise, NPE is thrown as below. This PR adds the null check.

Caused by: java.lang.NullPointerException
    at org.apache.hudi.SparkHoodieTableFileIndex.doParsePartitionColumnValues(SparkHoodieTableFileIndex.scala:381)
    at org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:363)
    at org.apache.hudi.BaseHoodieTableFileIndex.convertToPartitionPath(BaseHoodieTableFileIndex.java:487)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:335)
    at org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:219)
    at org.apache.hudi.SparkHoodieTableFileIndex.listMatchingPartitionPaths(SparkHoodieTableFileIndex.scala:197)
    at org.apache.hudi.HoodieFileIndex.prunePartitionsAndGetFileSlices(HoodieFileIndex.scala:346)
    at org.apache.hudi.HoodieFileIndex.filterFileSlices(HoodieFileIndex.scala:240)
    at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(HoodiePruneFileSourcePartitions.scala:55)
    at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(HoodiePruneFileSourcePartitions.scala:43)

Impact

bug fix

Risk level (write none, low medium or high below)

low

affects timestamp keygen only with certain outout types.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

Contributor's checklist

hudi-bot commented 3 weeks ago

CI report:

Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build