Open Krinkle opened 1 year ago
def merge_sort(arr): if len(arr) > 1: mid = len(arr) // 2 left_half = arr[:mid] right_half = arr[mid:]
merge_sort(left_half)
merge_sort(right_half)
i = j = k = 0
while i < len(left_half) and j < len(right_half):
if left_half[i] < right_half[j]:
arr[k] = left_half[i]
i += 1
else:
arr[k] = right_half[j]
j += 1
k += 1
while i < len(left_half):
arr[k] = left_half[i]
i += 1
k += 1
while j < len(right_half):
arr[k] = right_half[j]
j += 1
k += 1
data = [4, 2, 9, 6, 1, 5, 8, 7, 3] merge_sort(data)
for item in data:
print(item)
Sorting data in-place while consuming nodes on the fly is a common approach to efficiently manage large datasets in situations where you want to minimize memory usage. This is often used in scenarios like external sorting (sorting data that doesn't fit entirely in memory) and processing large streams of data. To achieve this, you can use techniques like merge sort or other in-place sorting algorithms. Below is a simplified example using Python to demonstrate this concept:
I measured both the performance and maximum RSS (as reported by GNU time) of this patch against a synthetic dataset, and it seems to reduce maximum RSS by about 2% while increasing runtime by about 5.4%:
config: def
avg: 5.544 ± 0.043
max rss: 227.585 ± 0.271
minor page faults: 36723.600 ± 62893.440
major page faults: 45.000 ± 0.000
config: free
avg: 5.848 ± 0.092
max rss: 223.104 ± 0.310
minor page faults: 35996.200 ± 175294.560
major page faults: 45.000 ± 0.000
It may be that these ratios could change significantly with different input; I see in your Phabricator posts that you're using significantly larger input. Is there a way that I can download 2022-08-24.excimer.all.log
(or a newer version of it) to try this out?
@matthew-olson-intel Yes! The raw data behind the visualisations at https://performance.wikimedia.org/php-profiling/ is published daily at https://performance.wikimedia.org/arclamp/logs/daily/.
The .excimer.all.log
suffixed files are the largest ones. I recommend using the ones from yesterday or earlier as the current day is incomplete.
@Krinkle OK, thanks -- I ran another gamut of experiments. These are all 10 iterations per config, using the mean average (runtime in seconds, RSS in MB).
config: def
avg: 152.025 ± 1.992
max rss: 5974.135 ± 0.377
minor page faults: 1502767.100 ± 138277.290
major page faults: 45.000 ± 0.000
config: free
avg: 163.438 ± 15.326
max rss: 3134.559 ± 0.167
minor page faults: 783530.200 ± 1217912.160
major page faults: 45.000 ± 0.000
Seems to have changed things quite a lot: a 90% reduction in max RSS, and a 7.5% hit to performance. A clear win for this input, IMO.
See https://phabricator.wikimedia.org/T315056
Use in-place sort for
@Data
and consume$Node
on the fly to use less memory.This helped stave off out-of-memory errors on cron scripts that builds svgs from large log files.
Re-creates PR https://github.com/brendangregg/FlameGraph/pull/298/ by @AaronSchulz.
Ref https://github.com/brendangregg/FlameGraph/issues/58.