Allow minor compaction for non-consecutive segments

apache / druid

Apache Druid: a high performance real-time analytics database.

Apache License 2.0

13.45k stars 3.7k forks source link

Each segment has a partitionId which uniquely identifies the segment in a time chunk. Currently, you cannot compact segments with minor compaction (which uses the segment lock) if their partitionIds are not consecutive. For example, you cannot compact the segments of the partitionIds 0, 1, 10 together because the partitionId 1 and the partitionId 10 are not consecutive.

This is an expected limitation of the minor compaction by its design. (It is for reducing memory footprint. See https://github.com/apache/druid/issues/7491 for more details.) However, in practice, it would be nice if the minor compaction can compact non-consecutive segments. This will be nice especially if there are some expected but transient task failures in streaming ingestion because those task failures can cause the non-consecutive segment IDs.

The minor compaction can support this if it's guaranteed that no new segments have the partitionId which falls in the overlapping root partition range of the existing segments.

Assuming that we will keep the current segment ID allocation protocol that monotonically increases the partition ID on task failures, the problem we want to solve is, given a missing partitionId, how we would know whether the segment of that ID really doesn't exist or it is being created by some other task. One way to do is modifying the compaction task to as below.

1) When some missing partitionIds are found, the compaction task tries to lock them using the regular locking mechanism. 2) If the locking succeeds, the compaction task can safely assume that those partitionIds will never be used since there is no ingestion task creating segments of those partitionIds. In this case, the compaction task can simply ignore those missing partitionIds and compact the given segments all together. 3) If the locking fails, there should be some ingestion task creating the segments of those partitionIds. In this case, the compaction task can split the input segments into multiple groups where each group has only consecutive partitionIds, and compact each group separately. Those segments that are being created by other task can be compacted later using another compaction task.

apache / druid

Allow minor compaction for non-consecutive segments #9768