ekmett / machines

Networks of composable stream transducers
Other
339 stars 46 forks source link

Add a new fanout operator which can terminate #70

Closed bens closed 8 years ago

bens commented 8 years ago

I've found the existing fanout functions surprising because they don't terminate, like

> run (source (map (:[]) [1,2,3]))
[[1],[2],[3]]
> run (fanout [source (map (:[]) [1,2,3])])
*hang*

with the (~*>) operator (happy to take name suggestions), it does

> run (stopped ~*> [source (map (:[]) [1,2,3])])
[[1,2,3]]

I've also used difference lists to avoid reversing the list of Processes with each step, but not attached to that if it's a bad idea. Hoping for some feedback on whether this is wanted or could be improved.

I can also add a fanoutSteps equivalent as well, with a Monoid constraint instead of Semigroup.

ekmett commented 8 years ago

I'd be curious to see what the feedback we can get here from other folks who use the existing fanout machinery looks like.

bens commented 8 years ago

Not much feedback then :( I've written a property test which backs up my initial comment: https://gist.github.com/bens/5df926362cc3fabefa80 - but I don't yet have any sense of if this is something useful enough to pull in. Any suggestions for other properties to try would be appreciated too.

acowley commented 8 years ago

I'll take a close look today. Sorry for the lack of feedback!

acowley commented 8 years ago

"today" ugh.

So I've investigated this, and I agree the current behavior is problematic. I took a crack at reformulating the beginning and ending stages of the chain of Awaits created by fanout without changing the overall structure. Namely, without manually stepping through the upstream machine.

I did this by first flushing every given machine, and then stopping when upstream has stopped or we all members of the fanout have stopped.

In trying to understand the issue, I've put this on the fanout branch of my fork, here's the commit.

Does this look like it does the right thing to you, @bens? I'm not committed to either approach, but I try to avoid manually stepping through operands of composition-like connectors in general for the sake of insulation to changes, and simplification of logic that deals with composition (e.g. rewrite rules).

acowley commented 8 years ago

Just to ping this to prevent it getting buried... @bens do you think there's any benefit to pulling in some of what I showed in my approach to this behavior change? I think the fix to fanout you've identified is worth getting into the library, so let me know if I can help getting it done.

bens commented 8 years ago

Sorry @acowley, I was away for the weekend and my evenings have been occupied. I'll take a closer look over the weekend definitely or earlier if I can.

bens commented 8 years ago

I changed my PR though I don't want it merged yet, I left out diff lists to show an approach which seems simpler to me. It takes me a while to understand what the fanoutGo, flushYields, etc code does where the version I have seems much more to the point to me. I'll make a better version with diff lists shortly.

bens commented 8 years ago

Ok, I've got it looking good to me, @acowley would you mind having a look?

bens commented 8 years ago

I did some benchmarking of our two versions: https://gist.github.com/bens/01955aa7f47b0b1437e9, I'll do some more testing on my version too.

acowley commented 8 years ago

That's excellent, @bens! Thanks for digging into it.

bens commented 8 years ago

In testing I found a problem with the fanout branch as well, or at least it looks like it to me:

> let m = (:[]) <$> construct(do{ x <- await; y <- await; yield x; yield y})
> mconcat . run $ source [0..] ~> m
[0, 1]
> mconcat . run $ source [0..] ~> AC.fanout [m]
[1, 0]
> mconcat . run $ source [0..] ~> BS.fanout [m]
[0, 1]

So I have to prefer my version, since it's about 3x faster (in the benchmarks I've done at least) and doesn't have at least one bug that I've found. Does that seem fair?

bens commented 8 years ago

I was thinking that maybe it's not worth calling this a bug if the Semigroup output of each step is considered to not care about order (commutative?) but it still seems strange that the output of (yield x >> yield y) would produce [y,x]. Even so there's a large speed difference between the two. Are there any objections to merging this PR?

acowley commented 8 years ago

No problems; I'm on board. I just wanted to take one last look at things, but I think it's all good.

Edit: Oh, I don't seem to have merge rights. You'll need @ekmett to sign off.

bens commented 8 years ago

Great, I'll wait for @ekmett to have a look then. Thanks @acowley!

bens commented 8 years ago

I squashed the history on my branch, I hope that's not a problem. Sorry for the noise but I think it's better here than in the history :)

ekmett commented 8 years ago

LGTM.

Expecting without types or warning commutativity on the Semigroup gives me pause, but we can document that the canonicity of the construction holds when the semigroup is commutative, and describe the behavior carefully for when it is not.

bens commented 8 years ago

Thanks @ekmett. To be clear, the version in this PR returns the same result as without fanout (when wrapped in mconcat) so that doesn't need documenting at least.