Closed MichaelScofield closed 17 hours ago
Where is reduce_runs
? I think it would be better to add the code link in issue.
Right now I do a quick and dirty walkaround by taking the first 1024 of combinations, like this:
...
runs.into_iter()
.combinations(k)
.take(1024) // here
.map(|runs_to_merge| merge_all_runs(runs_to_merge))
...
It may produce not minimal penalty
result, but the code execution can proceed. I can polish it a bit, but WDYT about the idea? @evenyag @v0y4g3r
Normally the combination number won't be that large, C(10,4)
is 210. The problem is why there're 2883 runs. can your provide the time ranges of all files?
@v0y4g3r The ssts were ingested directly, from early days before this reduce_runs
feature was added to our codes.
@v0y4g3r The ssts were ingested directly, from early days before this
reduce_runs
feature was added to our codes.
Looks like ingesting large number of files with overlapping time ranges is not the original use case of any existing compaction strategies that expect overlappping files gradually get merged into one or serveral files as they age. We should come up with a simpler strategy that always merge them into one if theire size does not exceed some threshold.
Only early greptimedb that don't do compaction after ingesting ssts are affected, not a big issue.
What type of bug is this?
Performance issue
What subsystems are affected?
Datanode
Minimal reproduce step
The method
reduce_runs
(in https://github.com/GreptimeTeam/greptimedb/blob/027284ed1b46a02375b9e08341f54e4a27ce0ed5/src/mito2/src/compaction/run.rs#L250) iterates over the combinations ofruns: Vec<SortedRun<T>>
onlet k = runs.len() + 1 - target;
. The time complexity of calculating combinations isC(n, k)
, which could be very time consuming to run. For example,C(10, 4) = 330791175
. Upon iterating this large combinations, the codes may execute too long time.I've observed this situation in one of our customers. He has too many sst files that are time ranges overlapped. The
runs
passed toreduce_runs
are large. See this log (not in our codes but I temporarily added for debugging) right before executingreduce_runs
:So the combinations in his
sorted_runs
areC(2883, 5) = 1653997252262256
, which is too big to iterate. The code's execution acts like hanging forever.What did you expect to see?
code execution proceeded
What did you see instead?
code execution hang
What operating system did you use?
ubuntu
What version of GreptimeDB did you use?
main
Relevant log output and stack trace
No response