Closed voonhous closed 1 year ago
Not sure if this is the correct approach , but should we prevent users from dropping a partition if there's a pending table service action on partition path?
SparkDeletePartitionCommitActionExecutor#execute
public HoodieWriteMetadata<HoodieData<WriteStatus>> execute() {
// ensure that there are no pending inflight clustering/compaction operations involving this partition
SyncableFileSystemView fileSystemView = (SyncableFileSystemView) table.getSliceView();
List<String> partitionPathsWithPendingInflightTableServiceActions = Stream
.concat(fileSystemView.getPendingCompactionOperations(), fileSystemView.getPendingLogCompactionOperations())
.map(op -> op.getRight().getPartitionPath())
.distinct()
.collect(Collectors.toList());
partitionPathsWithPendingInflightTableServiceActions.addAll(
fileSystemView.getFileGroupsInPendingClustering()
.map(x -> x.getKey().getPartitionPath())
.collect(Collectors.toList()));
if (partitions.stream().anyMatch(partitionPathsWithPendingInflightTableServiceActions::contains)) {
throw new HoodieDeletePartitionException("Failed to drop partitions. "
+ "Please ensure that there are no pending table service actions (clustering/compaction) for the "
+ "partitions to be deleted: " + partitions);
}
try {
...
Not a fix, but added a step to prevent such actions from happening with an informative error message on how to remedy such actions here: #7669
@voonhous Confirmed with the master code , that fix is working fine. It is now restricting the drop partition to happen only if any remaining compaction/clustering is pending.
org.apache.hudi.exception.HoodieDeletePartitionException: Failed to drop partitions. Please ensure that there are no pending table service actions (clustering/compaction) for the partitions to be deleted: [ts=1002]. Instant(s) of offending pending table service action: [20230512161603721]
Closing this bug as this seems to be a reasonable fix for this problem.
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
A clear and concise description of the problem.
TLDR
Dropping a partition when there's a pending table service action may cause downstream write errors + data inconsistencies
Detailed explanation
AbstractTableFileSystemView#resetFileGroupsReplaced
is invokedRoot cause
partition=a/filegroup_1
(assume thatpartition=a
only hasfilegroup_1
)partition=a
filegroup_1
partition=a/filegroup_1
to producepartition=a/filegroup_2
partition=a/filegroup_2
. (if it somehow manages to bypass AbstractFileSystemView#resetFileGroupsReplaced)Errors
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version : 0.13.0
Spark version : N.A
Hive version : N.A
Hadoop version : N.A
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.