amaembo / streamex

Enhancing Java Stream API
Apache License 2.0
2.18k stars 249 forks source link

Merge sort and limit operators #228

Closed numeralnathan closed 4 years ago

numeralnathan commented 4 years ago

Let's say we have the following stream pattern...

...
sorted().
limit(k).
...

If the input has millions of rows, then the sort operation is going to do O(n log n) operations. If the sort and limit operations were merged, then the sort operation only has to do O(n log k) operations. If the smallest elements are encountered first, then the sort operation only has to do O(n + k log k) operations.

This could also reduce the memory usage from O(n) to O(k).

amaembo commented 4 years ago

I don't like doing this in StreamEx, because it will slightly change the behavior of limit() operation making it quasi-intermediate. If fixing this, it should be fixed on JDK level. I tried to fix this in JDK but wasn't enthusiastic enough to finish this work. See: https://bugs.openjdk.java.net/browse/JDK-8153332 You can use MoreCollectors.greatest() and MoreCollectors.least(). These terminal operations already have this optimization.