Open Masterzach32 opened 3 years ago
Could you please elaborate on the exact use case of the operator?
Implementing it is pretty straightforward indeed, but it also can be a bit problematic due to unbounded memory consumption, especially for potentially infinite flows.
We are not aiming to be a complete sequence replacement here, so it would be really nice to be sure it actually has its uses before adding it
Sure thing.
One use case in my Discord bot is to get the artists from one or multiple spotify playlists. Each playlist can have multiple tracks, and each track can have multiple artists. Each playlist can also have multiple tracks from the same artist. For example:
api.browse.getFeaturedPlaylists(market = Market.US)
.getAllAsFlowNotNull() // Flow<SimplePlaylist>
.mapNotNull { api.playlists.getPlaylist(it.id, Market.US) } // Flow<Playlist>
// get tracks from playlists, emitted in the flow in batches of 50.
.flatMapConcat { it.tracks.getAllAsFlowNotNull() } // Flow<PlaylistTrack>
.mapNotNull { it.track?.asTrack } // Flow<Track>
.flatMapConcat { it.artists.asFlow() } // Flow<SimpleArtist>
.distinctBy { it.id }
.mapNotNull { api.artists.getArtist(it.id) } // Flow<Artist>
Here I use distinctBy
to filter duplicate artists, before getting the full Artist object from the spotify API. This helps reduce the total amount of HTTP requests
I agree with this operator, sometimes I don't want to collect duplicate data.
@RinOrz, sure, but the only way to avoid it completely is to store every distinct result ever emitted, which can consume a lot of memory.
The use case above could also be served, for example, by a version of distinctBy
that doesn't store everything but only remembers, let's say, a hundred entries. Most HTTP requests would still be avoided, but we would have a hard limit on the amount of memory used by this operator, which is… probably good? Or maybe not and people do need distinctBy
that uses unlimited memory much more often? Who can tell?
This is to say that you can help the design process if you describe specifically when and why you want to avoid collecting duplicates.
Both List and Sequence have the distinctBy extension function
Example implementation:
Is there a reason that the Flow API does not have this? Would there be interest in a function like this in the coroutines library? I have several use cases for this in my projects, I'd assume others might find it useful as well. I could make a PR for this if there is interest.