Closed linusdm closed 1 year ago
I will investigate but note there is no benefit in shuffling after a from_enumerable. Shuffle is only useful when shuffling across multiple flows or increasing/decreasing the number of stages after. If there is no processing before it, there is no benefit.
Thx! Is it still useful when configuring a window, without changing the number of stages after?
I was changing to shuffle instead of partition because I have an embarrassingly parallel problem at hand. My line of thinking was that if I can avoid the hashing in the partition step, I could save some time. But maybe I'm misinterpreting this part in the documentation:
However, notice that unnecessary partitioning will increase memory usage and reduce throughput with no benefit whatsoever. Flow takes care of using all cores regardless of the number of times you call partition. You should only partition when the problem you are trying to solve requires you to route the data around. Such as the problem presented in Flow's module documentation. If you can solve a problem without using partition at all, that is typically preferred.
You can configure the window on from_enumerable, no?
^ yes. I had some misconceptions about Flow.shuffle/2
.
For what it's worth, I saw the same error when merging flows with Flow.merge/3
with a GenStage.DemandDispatcher
which might be a more realistic use, like this:
flow =
Flow.merge(
[
Flow.from_enumerable(1..3),
Flow.from_enumerable(4..6)
],
GenStage.DemandDispatcher
)
Kino.start_child(%{id: MyFlow, start: {Flow, :start_link, [flow]}})
Using the GenStage.PartitionDispatcher
as second argument to Flow.merge/2
does not yield an error.
It seems a GenStage consumer stage is started with the forbidden option :dispatcher
(the :dispatcher
option is only applicable to :producer
and :producer_consumer
type stages, which sounds reasonable because a consumer stage shouldn't dispatch, as it's last in chain).
Sorry if I'm stating the obvious... I find Flow quite interesting and I'm learning a lot.
I'll try to add a unit test that triggers this situation in a PR.
I'm hitting an issue when I try to combine
Flow.shuffle/2
withFlow.start_link/2
. For example, if I start this simple flow as a child in Kino/Livebook withthe error
unknown options [dispatcher: GenStage.DemandDispatcher]
is raised.Running the same flow from the current process does not hit this error and yields as expected:
I only hit the error when running it with Flow version
1.2.0
and the most recent version1.2.1
. Version1.1.0
seems to be ok. Could this be regression since1.2.0
?I understand that the
GenStage.DemandDispatcher
is used when usingFlow.shuffle/2
(as opposed to theGenStage.PartitionDispatcher
when usingFlow.partition/2
). But I'm confused as of why this:dispatcher
option would be unknown at that place.Full error: