Closed elizarov closed 4 years ago
This looks good as #1086 implementation. One thing missing is the sharing timeout, which use case is described in https://github.com/Kotlin/kotlinx.coroutines/issues/1086#issuecomment-485675107
One question about 'starting' a shared/published flow through launch { flow.collect {} }
.
In #1086 , starting a shared Flow could only happen on a Flow that was already shared, because connect()
could only be invoked on a ConnectableFlow
. Any subsequent subscribers/cosumers/collectors to this shared flow would not restart the producer of the flow.
Say I want to create an extension function out of launch { flow.collect {} }
called connect:
fun <T> Flow<T>.connect(scope: CoroutineScope) = scope.launch { collect {} }
I can't be sure that the receiver of the connect
function is a shared/published Flow.
The argument for exposing a shared/published flow as a separate type, as a ConnectableFlow
, is that I can then write this extension function as follows:
fun <T> ConnectableFlow<T>.connect(scope: CoroutineScope) = scope.launch { collect {} }
and non-shared/non-published Flows won't be able to get 'connect'ed.
What happens to the cached "history" when moving from one to zero collectors (cancelling the upstream collection), then back to one? Does the history get preserved across upstream collections or lost?
@zach-klippenstein That is why Rx does not use share()
for a shared cache, but cache()
. The number of subscribers governs when the flow/stream starts, but it doesn't govern when the flow/stream stops.
With Rx' ConnectableObservable
's refCount()
, when there is at least 1 subscriber, the Flowable starts. When there are no more subscribers the Flowable stops.
With Rx' ConnectableObservable
's autoConnect()
, when there is at least 1 subscriber, the Flowable starts but even when there are no more subscribers, the Flowable continues. the 'history' won't get lost.
What is unclear is what shall be the default value (if any).
To me, the name "share" implies only multicasting, so zero would be a reasonable default. The caching/replaying behavior is an additional feature that is often used along with multicasting, but not necessarily implied by the name.
@streetsofboston That's how Rx works, yes, but I was asking for clarification about this operator because I didn't see it specified in the issue description.
I believe share(0)
as is proposed here would be equivalent to RxJava's share()
(i.e. publish().refCount()
), and share(n)
would be equivalent to RxJava's replay(n).refCount()
.
The behavior of saving the cache across "connections" would be similar to Jake Wharton's RxReplayingShare.
What happens to the cached "history" when moving from one to zero collectors (cancelling the upstream collection), then back to one? Does the history get preserved across upstream collections or lost?
I would argue that this should be an option, since it could be useful to have it preserve data over cancelling (for example to use this flow as a memory cache if data takes a while to load after subscribing) or it would not be useful (for example when data from the flow changes frequently, rendering old values obsolete), depending on the context
What is unclear is what shall be the default value (if any).
I think that there shouldn't be any default value, but startWith
operator could be added for this like RX.
@zach-klippenstein That's how I modeled my prototype implementation of the ConnectableFlow as well, for the share()
and cache()
functions: https://gist.github.com/streetsofboston/39c3f8c882c9891b35e0ebc3cd812381
Roman wants to not expose yet another type (ConnectableFlow) from the library and only add one more function called share()
for the most-used use-cases so that the Flow api doesn't become bloated. I agree with this sentiment. Keep it as small as possible.
And maybe the argument in favor of exposing a new ConnectableFlow
type is strong enough (see my previous comment about a connect()
extension function) so that it will be added to the Flow api ....
In the end it's up to the maintainers of kotlinx.coroutines to weigh these factors and make a decision :-)
However, I don't think overloading the proposed share()
operator with caching functionality (share when size == 0, cache when size > 0) is a good idea. Better to spell out the functions' responsibilities and have a share()
function without a size parameter that shares and a cache()
function with size parameter that caches.
Maybe a separate, but related, issue should be opened for a Flow.cache
operator.
@matejdro With the ConnectableFlow
from my gist, either would be possible:
Cache survives re-activations of the flow:
flow.replay(size).autoConnect()
Cache does not survive re-activations of the flow:
flow.replay(size).refCount()
This looks good as #1086 implementation. One thing missing is the sharing timeout, which use case is described in #1086 (comment)
Thanks for timeout reminder. I'll see how it can be incorporate here is some nicer way than just adding an additional "timeout" parameter to this operator.
Say I want to create an extension function out of
launch { flow.collect {} }
called connect:
We actually plan this kind of operator, tentatively to be called launchIn(scope)
for all kinds of flows. The only missing piece in this operator's design is user-friendly error-handling.
However, I don't think overloading the proposed
share()
operator with caching functionality (share when size == 0, cache when size > 0) is a good idea. Better to spell out the functions responsibilities and have ashare()
function without a size parameter that shares and acache()
function with size parameter that caches.
I don't see how to decouple share
and cache
functionality. It seems that you cannot cache without sharing, so we can just as well have a single operator with optional integer parameter that defaults to 0
. No need to overload, because share()
is conceptually same as cache(0)
.
@elizarov
you can always launch a dummy collector: launch { flow.collect {} }
But the scope remains active until job is working
@fvasco
That's true. It can be cancelled, though.
And it will resume when, in case of share()
, all subscribers/collectors stop collecting.
I don't see how to decouple share and cache functionality My prototype implementation decouples the two. It requires two private implementations of
ConnectableFlow
, though...
My prototype implementation decouples the two. It requires two private implementations of
ConnectableFlow
, though...
I actually envision a single implementation where "replay size" (of zero or more) is just a configuration parameter.
That proposal lookg good. It leaves me wondering about Throwable
s handling though:
If you need to handle errors for all collectors in a shared flow (that is, handle emitter's errors), then you have to put error-handling operators before you call share()
. Btw, take a look at the error handling proposal: https://github.com/Kotlin/kotlinx.coroutines/issues/1263
My consideration is that the launch trick has non-trivial consequences. This can be related to #1065
I actually envision a single implementation where "replay size" (of zero or more) is just a configuration parameter.
You can design it to just have one "share" method with a replay-size parameter, but my point was that decoupling the share and cache use cases is possible. I'd argue for decoupling, but you can argue for one "share" method as well.
Cache survives re-activations of the flow: flow.replay(size).autoConnect()
@streetsofboston This is slightly different. AutoConnect preserves the upstream connection as well. RxReplayingShare will still disconnect upstream when the ref count hits zero (like the refCount
operator), but it will preserve the cache and continue to replay the cache to new subscribers. The latter is useful if the upstream subscription represents some expensive resource that should be shutdown if nobody is actively listening, but you still want to have some initial value to emit immediately if the upstream takes a while to "warm up", like @matejdro suggested.
In general I am wary of autoConnect
in Rx because it's so easy to misuse and accidentally leak the upsteam connection. Structured concurrency makes that a lot safer, since the connection requires the collection to happen in a particular scope.
I think that there shouldn't be any default value, but startWith operator could be added for this like RX.
@matejdro The function of startWith
, at least as Rx defines it, is unrelated to multicasting and caching, but I see what you meant. I read that open question as referring to the "default value" of the cache size parameter - zero, one, or something else - not the default value of the cache itself, which, as you said, would just be empty until the upstream emits something.
In @streetsofboston's proposal, share
and cache
are "sibling" operators with disjoint functionality, but both are composed of the same two steps: of multicasting and automatic connection. I also prefer that naming because caching implies sharing, but sharing does not imply caching.
You could also make share
and cache
coordinate through operator fusion (like the recent change to buffer
), but it's unclear what cache
would do if not adjacent to share
, since it doesn't make sense to cache for a single collector:
flow.share() // returns a multicasted flow
.cache(1) // configures the multicasted flow to cache 1 emission
I don't mind using a single operator for both, as long as the operator name communicates that it does both multicasting and caching (e.g. shareReplaying
).
Having a cache()
operator for Flow
would be misleading as you might use it incorrectly on any Flow
. The fact that a flow is shared and has cache is an implementation detail, and we should avoid operators that make assumptions on the source Flow
. Since cache only makes sense when the Flow
is shared, and needs to be applied only on the sharing flow, not by the consumers, it should be a configuration, or an overload of the share
operator.
How will the updated signature shareIn(scope)
work with several dynamically connecting and disconnecting collectors? E.i. there are use cases for the share
operator that require the flow to keep track of the scopes of its consumers. Here is a quote from a Slack message with such a use case:
What I'm looking is to create long time operation that persists through multiple scopes. For example:
- Screen A (with Scope A) starts long operation
- User presses a button to switch to Screen B
- Screen B (with Scope B) also starts this long operation, but since this operation is already running, it would just somehow "add its scope"
- Screen A is closed, so Scope A that started the operation is cancelled. But since Scope B is still using this operation, it would not stop until Scope B is also cancelled.
shareIn(GlobalScope)
would do the trick.
shareIn(GlobalScope)
would do the trick.
Would it close the flow if all of the consumers are cancelled? In terms of the above example that would mean canceling the long-running operation in the flow if both consumers, Screen A and Screen B, are closed.
@voddan Yes, that's the whole point of having a share
operator for Flow
instead of a ConflatedBroadcastChannel
that you have to close manually.
Here is my use case.
I'd like to represent a stream of clicks as a Flow<Unit>
. It should support multiple subscribers, while registering a listener on subscription (at least one subscriber), and unregistering on unsubscription (no more subscriptions). I believe in RxJava, this would be represented by publish().refCount()
- but that naming is pretty terrible, and a lot could be improved here.
My UI might have the following interface:
interface MyUi {
fun clicks(): Flow<Unit>
}
On Android, this means using setOnClickListener { .. }
. This means that when the clicks(): Flow<Unit>
is subscribed to, I want setOnClickListener { .. }
to be registered. When it is unsubscribed from, I want setOnClickListener(null)
to be called, so the listener is unregistered.
This scenario works great for one subscriber, but falls short when you want multiple subscribers. Take the following scenario (note: I know .onEach { }
could be leveraged, but that reduces flexibility)
fun onCreate() {
ui.clicks()
.onEach { viewModel.send(ClickEvent) }
.launchIn(onCreateScope)
ui.clicks()
.onEach { log.v("Ui clicked!") }
.launchIn(onCreateScope)
}
viewModel.send(ClickEvent)
now stops receiving events because of the second subscription. I would like to be able to share an instance of Flow<T>
to two subscribers, so setOnClickListener { .. }
is registered once for both (when subscriber count goes from 0 to 1), and then unregister the listener via setOnClickListener(null)
when the subscriber count goes from 1 to 0
Hi folks, happy sunday. I wanted to check if any progress or timetable on this operator. Thank you kindly
food for thought - customizing share behavior over the default could be controlled via an optional lambda. example:
stateFlow.shareIn(scope) {
autoConnect() // could be renamed `autoCollect`, `collectImmediately()`, `collectImmediate()`
}
downside would be the extra class definition.
one common "gotcha" is that when sharing:
stateFlow.shareIn(scope)
the first subscriber will get the replayed State
value from the StateFlow<State>
, but new subscribers won't. Could there be a strategy to potentially mitigate this? (I can't think of one, other than a lint warning)
@ZakTaccardi I was thinking about a separate operator for state flows that avoids this problems and returns a StateFlow
. This way you can share chains of state computations. Moreover, this operator would be defined on a Flow
so that you can first do state.map { ... }
(which gets you a Flow
) then "share the resulting state". It means, though, that we cannot reuse share
name and have to invent another name for it which is a challenge.
We needed a similar operator in Store and implemented a custom version. Just linking here for reference. https://paper.dropbox.com/doc/Store-Flow-Multiplexing--ApUW0sTj9Oyl7iRxyNBaBOK5AQ-GcFETeZI9pHEk2zmnHfW4 https://github.com/friendlyrobotnyc/Store/tree/master/store/src/main/java/com/nytimes/android/external/store4/impl/multicast
The implementation works by launching an actor when first downstream observer shows up.
Each downstream receives an unlimited channel where the Actor sends values that it received from upstream.
We wanted to avoid abusing the upstream when downstreams are slow to collect. To achieve that, each message is ack
ed by the downstream when collected and upstream waits for an ack before collecting anther value.
In terms of buffering, by default, we do not buffer. If a downstream arrives late where it does not end up getting any new values, we simply create a new producer. There is also an option to buffer.
We've also added piggybacking where downstream can be kept open even after upstream closes to receive new values if we end up creating another producer for another downstream (this is where it gets really custom to our use case).
Implementation is probably not great and more expensive it needs to be but works for our use case.
Could someone please clarify what exactly would be the differences between this future shareIn()
operator and the currently available .broadcastIn().asFlow()
? Thanks in advance.
The idea is that shareIn()
will automatically start collection from the upstream when needed. The difference is that broadcastIn().asFlow()
starts collecting immediately and so becomes "hot" stream that is always collecting upstream regardless of the presence of downstream collectors.
Just wanted to say, that share (refCount, replay and friends) is last blocker for me to switch to Flow
@ursusursus
Just wanted to say, that share (refCount, replay and friends) is last blocker for me to switch to Flow My gist of #1086 may help you: https://gist.github.com/streetsofboston/39c3f8c882c9891b35e0ebc3cd812381
Migrating from RxJava and I just realize that this operator is not yet implemented in Flow. 😢
@GianfrancoMS I was trying to move from WebFlux to Flow and also running into the issue. Is there an easy workaround, to at least have something the works?
@GianfrancoMS I was trying to move from WebFlux to Flow and also running into the issue. Is there an easy workaround, to at least have something the works?
See my earlier comment; it may be of help for you: https://github.com/Kotlin/kotlinx.coroutines/issues/1261#issuecomment-560980521
@GianfrancoMS
Migrating from RxJava and I just realize that this operator is not yet implemented in Flow. 😢
If you do not mind making it “hot”, you can use broadcastIn
.
It's working now. My use case is pub/sub where I get events in from Kafka, and there could be multiple clients that should get the same event. https://github.com/openweb-nl/kafka-graphql-examples/tree/ge-kotlin/graphql-endpoint
I'd like to revisit the fusing/conflation of cache
and shareIn
.
It seems to me that the most common (Android) use-case for cache
is the one of an exposed LiveData
or ConflatedBroadcastChannel
-backed Flow
in a ViewModel. When a configuration change happens, you lose that data and have to start fresh... But how many times do we actually observe that LiveData
from more than one observer? I'd say almost never.
If we were to get that multicasting ability for free, that'd be one thing, but it clearly isn't free.
Here's my own gist for a SharedFlow
. gist
I think it makes much more sense to chain the operators: myFlow.cache(1).shareIn(scope)
. With my implementation, I don't see a real need to do a fusion like with buffer and context.
Here's a gist for a cold cache(size: Int)
Flow
using a CircularArray
: gist
I don't love cache(size: Int)
as the operator name. I feel like the English meaning is far too similar to buffer
and can lead to lots of confusion and mis-use. replay(size: Int)
feels much more distinctive to me.
If no one has strong objections, I can get them PR-ready in a few days.
@digitalbuddha did the trick here https://github.com/dropbox/Store/blob/master/multicast/src/main/kotlin/com/dropbox/flow/multicast/Multicaster.kt
Here is my take on the share operator: https://gist.github.com/matejdro/a9c838bf0066595fb52b4b8816f49252
It supports conflating and timeouts described in https://github.com/Kotlin/kotlinx.coroutines/issues/1086#issuecomment-485675107. From what I see, implementation is a bit simpler than above links (it just uses an actor that loops through events), but it is not very fast - I would not use it to emit thousands events per second. However, in my projects I almost never need that big of a throughput, so it fits my use cases.
hi, @matejdro I have seen your gist, and it looks good.
What do you think of using an extension function in this loc? https://gist.github.com/matejdro/a9c838bf0066595fb52b4b8816f49252#file-multicastflowtest-kt-L139
Yes, that would probably be a good idea.
I've also came across to use this for a usecase. Right know to do it is imperative premature collecting the flow.
I just noticed this one:
It will have to be, actually, a shareIn(scope) operator. Otherwise, the scope it works in will be bound the the first collectors and when this callector is cancelled it will not be able to maintain emissions to other collectors.
I think this unnecessarily complicates the use of this feature. Flow is already scoped by the downstream (it starts collecting when first collector starts and stops when all collectors get cancelled). Introducing another scope would significantly complicate implementations where such share operator is exposed by the stateless objects.
It definitely makes the stream more complicated to reason about, but I don't think it does so unnecessarily. Multicasting inherently involves a scope that is separate from that of any of the downstream collectors, because no downstream collector is guaranteed to be active for the lifetime of the upstream collection. That scope could be implicit, as in the case with a reference counting approach, but sometimes you need it to be explicit: if the owner of the multicasting operation has its own scope (e.g. is a service-type object that gets shutdown when a user logs out of an app), it may want to explicitly manage the lifetime of that upstream collection and cancel it when the service object itself is shutdown.
The
share()
operator operates onFlow<T>
and returnsFlow<T>
. It shall have the following semantics. The resulting flow is cold, but when one collector shart collecting from it, the it starts to collect from the upstream flow, activating the emitter upstream. The trick of theshare
operator is that when additional collectors appear in the downstream, they all "share" the same upstream emitter.For example, consider the flow:
If you launch two collectors:
Then you shall see "Emit 0 / A: got 0 / Emit 0 / B: got 0 / Emit 1 / A: got 1 / Emit 1 / B: got 1 / ...".
However, if you change the flow to
val flow = flow { /* same */ }.share()
, then you shall see "Emit 0 / A: got 0 / B: got 0 / Emit 1 / A: got 1 / B: got 1 / ...", that is one emission gets delivered to both collectors.Now if you need to artificially "start" the shared flow simply to keep it active, then you can always launch a dummy collector:
launch { flow.collect {} }
that works as a "reference" which is active until you cancel the resulting job.TBD: Share operator might need some configuration with the respect to how much "history" to keep in memory for "late collectors". So far it seems that one non-negative integer is enough (with zero -- new collector don't get any history, with one -- only the most recent value, with more -- the specified number of recent values). What is unclear is what shall be the default value (if any).
UPDATE: It will have to be, actually, a
shareIn(scope)
operator. Otherwise, the scope it works in will be bound the the first collectors and when this callector is cancelled it will not be able to maintain emissions to other collectors.