amaembo / streamex

Enhancing Java Stream API
Apache License 2.0
2.2k stars 251 forks source link

StreamEx.distinct(mapper, binaryOperator) #212

Closed mmariotti closed 4 years ago

mmariotti commented 4 years ago

Hello Tagir, Lately I stumbled upon a variant of the "gratest-n-per-group" problem, and I wonder if it'll be possible to add a distinct() method that results in deterministic selection of the element, using either a BinaryOperator or a Comparator.

This is the code I'm actually using:

Map<Region, Record> latestRecordPerRegionMap = StreamEx.of(allRecords)
    .toMap(
        Record::getRegion,
        Function.identity(),
        BinaryOperator.maxBy(Comparator.comparing(Record::getRegistered)));

List<Record> latestRecords = StreamEx.ofValues(latestRecordPerRegionMap)
    .toList();

If possible, I'd like something like:

List<Record> latestRecords = StreamEx.of(allRecords)
    .distinct(Record::getRegion, BinaryOperator.maxBy(Comparator.comparing(Record::getRegistered)))
    .toList();

If you come up with an alternative, suggestions are welcome.

Thank you

amaembo commented 4 years ago

I believe you wanted .distinct(Record::getRegion, BinaryOperator.maxBy(Comparator.comparing(Record::getRegistered)))?

mmariotti commented 4 years ago

Oh, yes. My bad.

amaembo commented 4 years ago

It's impossible to implement such kind of operation in a streaming way. You cannot emit any element to the downstream operation until you collect all the elements from the upstream. So in fact, this operation would be a terminal-like and it would collect everything into the internal Map. I believe, it's better to do this explicitly, as you did via toMap collector. While it's more verbose, it clearly shows that the stream ends here and you need to start another stream.

There's only 'sorted()` operation and some flavors of it that behaves like this. I don't like adding more similar 'full barrier' operations.

amaembo commented 4 years ago

Btw note that you can always create a static method in your project returning Function<StreamEx, StreamEx> (type parameters omitted) and use it like .chain(myDistinct(Record::getRegion, BinaryOperator.maxBy(Comparator.comparing(Record::getRegistered)))

mmariotti commented 4 years ago

I suspected it's impossible without actually consume the whole stream. Thanks anyway :)