proposal: spec: clone and splice, new channel primitives

robpike commented 6 years ago

Here are a couple of suggestions made by Doug McIlroy, original author of test/chan/powser[12].go and instigator of pipes in Unix. They are intriguing.

In Doug's words:

====

splice c1 c2 where channel c1 contains no buffered data, identifies the write end of channel c1 with that of channel c2. Channel c1 becomes write-only.

clone c makes a new read-only channel positioned at the same place in the data stream as readable channel c. Thereafter both channels read the same data sequence from the stream at unrelated rates.

Splice allows a filter to elide itself from a pipeline when it has no further substantive work to do, rather than going into a copy loop.

Clone enables (buffered) fanout of a data stream. Buffered data items may be garbage-collected when they have been delivered to all readers.

These two capabilities are of general utility in stream processing. golang.org/test/chan/powser1.go is one application that could profitably use them--to eliminate the many instances of Copy and also the per-element go-routines spawned for every Split. Some Unix variants have offered pipe splicing, and fanout is a staple of dataflow algorithms. The current workarounds consume an awful lot of bootless machine cycles.

urandom commented 6 years ago

Are there any more details on how the clone works? Say that we have the following code:

var parent, cl chan int

go func() {
    for {
          select {
          case i := <-parent:
                fmt.Println(i)
          }
    }
}()

go func() {
    for {
          select {
          case i := <-cl:
               fmt.Println("Clone", i)
               if i == 3 { return }
          }
    }
}()

for i := 0; i <5; i++ {
     if i == 2 {
             cl = clone parent
     }
     parent <- i
}

Would that print something like:

0
1
2
Clone 2
3
Clone 3
4

I.e. would writing 4 to parent block since reading from cl stopped after receiving 3? How would things change if parent was buffered, would it matter at all? What if parent wasn't read from at all, would writing to it block if there are active clones?

Also, I have no idea what splice does from the description :)

kardianos commented 6 years ago

If I read it right, splice c1 c2 makes c1 point to c2 so a write to c1 can be read from c2.

urandom commented 6 years ago

@kardianos So this is useful for cases where you want to weave two 'streams' into one, as it work. I.e. you read from c2, and write to both c1 and c2 and get the values from both channels while you read, as they are written? Or the reverse of clone as it were?

jimmyfrasche commented 6 years ago

edit: discussion moved to #26343

I'd like to add drain(c) for consideration.

drain(c) would be roughly equivalent to the current idiom of

go func() {
  for range c {
  }
}()

except that drain doesn't create a goroutine: it just has c discard any receives in the scheduler without blocking the sender and causes other attempts to receive from c to panic. It's kind of like close for readers.

The code it replaces follows a fairly simple pattern but, even if the compiler recognized it, it wouldn't be able to perform the same optimization as that relies on the certainty that c can no longer be read from in another goroutine.

This is less useful than splice or clone but this seems like a good place to talk about additional channel primitives that would be more efficient and less fragile as builtins.

Also, if the drained channel was created by clone it could be removed from the fan out logic since the runtime knows that no one's listening, so it is somewhat more related than it appears.

neild commented 6 years ago

Would splice c1 c2 require that c1 and c2 have the same element type, or merely compatible ones? Can I splice a chan int64 into a chan time.Duration? A chan int32?

robpike commented 6 years ago

I would expect that c1 and c2 would need identical channel types.

urandom commented 6 years ago

@jimmyfrasche When a channel is set to be drained, would that include a special syntax on the write side that would inform the writer that the channel is being drained and no longer processing the data?

if drained := c <- someData; drained {
    return
}

urandom commented 6 years ago

Also, if the channel being drained is a clone, would the stream automatically stop sending to it?

jimmyfrasche commented 6 years ago

Never said this explicitly: It would only be safe to use drain when there was a single reader (possibly of a cloned chan) the same way it's only safe to close a chan when there's a single writer.

@urandom drain(clone(c)) would cancel out. Whatever runtime voodoo handles the fan out necessary to support clone could immediately drop any drained channel. That's just an optimization, but it seems like a useful/easy one.

I suppose you would need to detect a drained channel. There's something disquieting about making the send statement into an expression, but detection would need to be paired with a send.

If you couldn't detect it, something like

go func() {
  for id := 1;; id++ {
    c <- id
  }
}()

would just spin indefinitely. Though that is the case now if you create a drain goroutine manually. If this example detected a drain, it could just return.

That makes me wonder about something like

for {
  select {
  case drainedChan <- 1:
  case regularChan <- 2:
  }
}

drainedChan can always send but regularChan might not be able to. To keep the semantics you'd need to hit the drainedChan case but you may need to prioritize sends to regularChan since it's the only one doing work outside of the goroutine. In this example, if you could detect that the chan was drained, you could set the chan to nil.

(I'm still waiting for someone smarter than me to explain why this is a terrible idea, though :laughing: )

neild commented 6 years ago

Channels currently pass data in only one direction. Being able to detect a drained channel would make that bidirectional. It seems simpler and more consistent for a drain to be precisely equivalent to go func() { for _ = range ch { } )() (aside from not creating a new goroutine).

robpike commented 6 years ago

This has become a discussion about an operation not even mentioned by the proposer. If you want to talk about drain, which is a fine thing to talk about, please open a new issue for that.

jimmyfrasche commented 6 years ago

@robpike sorry for the hijack. To return to the original proposal, the definition of clone seems to imply infinitely-buffered channels.

I would have expected cloning a channel to behave something more like:

For an unbuffered chan, c, c <- v blocks until c and all clones of c have received v.

For a buffered chan, c, c <- v blocks if c or any clones of c have filled their buffer.

jimmyfrasche commented 6 years ago

What happens if I do this:

splice(c1, c2)
close(c1)
c2 <- v

Do receives on c2 always succeed instantly getting a zero value unless there's a blocked send on c2?

Does c2 close as well?

Does it take c1 out of the equation leaving c2 as it was before the splice, effectively unsplicing the two?

Does it just panic?

bakul commented 6 years ago

Splice: The use case of splice seems to be this: a filter that receives data from c1 and sends some subset of, or transformed, data out to c2. If at some point the filter just wants to passthrough data, doing no filtering, splice c1 c2 would patch through the data. That is anything sent to c1 can be read from c2, allowing the filter goroutine to get out of the way (or terminate). If so, any buffered data in c1 can be copied out to c2 at the time of splice (though that would potentially block splice until there is enough space in c2!).

Clone: Is c1 := clone c0; c2 := clone c1 equivalent to c1 := clone c0; c2 :- clone c0? Assuming no intervening reads.

Given a channel with an N item bluffer, the leading item can not be discarded until all the receiver pick it up, right? Would it make sense to add an operation to expand buffering?

This is not strictly a clone operation since not all the functionality is cloned. It is more of a fanout. There can be a corresponding fanin (which sort of overlaps with the splice operation). Conceptually:

cA, Cb, Cc, ... := fanout cR
fanin cA, cB, Cc .. cW

For fanout, cR can be a readonly channel. For fanin, cW can be a write only channel. Fanout from cR means anything written to the other end of cR can be read from cA, cB, cC etc. Fanin to cW means anything written to cA, cB, cC etc. can be read from the other end of cW.

An equally interesting operation would be the inverse of splice. Suppose the same filter now needs to be reapplied based on some condition (or you want to tap into a channel). I can envision

cIn, cOut := unsplice c
go myFilter(cIn, cOut)

The effect is as follows:

before: producer =>c=> consumer
after:  producer =>c ~~ cIn=> filter =>cOut ~~ c=> consumer

The reader and writer of c need not be aware of the existence of the filter tap. Any buffered data in c at the time of unsplice can be read from cIn by the filter (and not by the consumer of c), since a decision to insert a filter can be triggered based on received items from c.

PS:

I used fanin & fanout to make the symmetry explicit. I don't care what actual keywords get used.
I think Go should've generated read and write end explicitly (just like unix pipe(2)). Something like cIn, cOut := make(chan int, ...). Too late now. Another useful function may have been to allow creating bidirectional channels. E.g. c1, c2 := make(chan int, 5, "bidirectional") so that writes on c1 get read from c2 and vice versa. This would make unsplice even more useful (e.g. inserting an protocol or encryption module based on negotiation).
Would've preferred it if these things were possible to implement in user code, rather than build in the language.

jimmyfrasche commented 6 years ago

@bakul

[A]ny buffered data in c1 can be copied out to c2 at the time of splice (though that would potentially block splice until there is enough space in c2!).

That does seem reasonable to me and a bit more user-friendly. I have been wondering about that restriction, too. Maybe there's a good reason for it that I'm not seeing.

This is not strictly a clone operation since not all the functionality is cloned.

That's true. Perhaps split would be a more descriptive name. (And split/splice is a fun word combo.)

fanin/fanout

This could be achieved by multiple calls to splice/clone.

An equally interesting operation would be the inverse of splice. Suppose the same filter now needs to be reapplied based on some condition (or you want to tap into a channel).

Could you file a separate proposal for that?

Would've preferred it if these things were possible to implement in user code, rather than build in the language.

The equivalent functionality is possible in user code, but it requires additional channels and goroutines. This proposal would avoid that overhead and make it simpler to build more complex topologies.

bakul commented 6 years ago

Could you file a separate proposal for that?

I am just exploring features that may dovetail with this proposal at the moment.

The equivalent functionality is possible in user code, but it requires additional channels and goroutines.

What I was getting at is to be able to implement something similar in user code without needing additional channels or goroutines. The proposed changes seem non-trivial and are perhaps better suited as a library component of some sort. Plus I would like to be able to avoid context switching overhead for simple but useful operations. For example:

func sum(a, b <-chan int) <-chan int {
    c := make(chan int, ...)
    go func() for {  c <- (<- a) + (<- b) }()
    return c
}

[yeah, I know this has problems; the code is just to illustrate the idea] Conceptually this can be implemented without a goroutine: a receive on c will result in a receive on a nd b and if either blocks, so does the receive on c. The receive returns when both a and b yield a value and a sum can be written into c. Symmetrically, if a value is written into a or b, the sum can be done apriori and written into c. Syntax wise c := a + b captures this perfectly but I don't expect any Go authors to go for that!

IMHO a pipe is a better abstraction than an iterator in other languages and it would be nice if it is available without needing to use goroutines. Since this proposal started from a suggestion by Doug McIlroy, I couldn't help wondering if we can make Go channels as easy to use as pipes are in a shell!

On the other hand I can't think of a lower level building block to do this as yet.

CAFxX commented 6 years ago

Regarding splice-like functionality: I have sometimes encountered the case in which I have two channels, one from which I'm supposed to read and another I'm supposed to write to but -and here's the tricky bit- only when the two operations (receive and send) are not going to block. Something similar to this, but in the following example the send on writeCh could block so it's not quite what I needed:

for {
  select {
  // ... other cases ...
  case e := <-readCh:
    // ...optionally drop/manipulate e here...
    writeCh <- e
  }
}

There are some ways in which this can be written to satisfy the non-blocking requirements but they tend to get hairy and in any case they (in general) require buffering at least one element (even if both channels are unbuffered) or creating dedicated "channel pump" goroutines.

I am not sure if this is a common enough scenario, but I wanted to submit it for consideration anyway because, as I mentioned above, I found myself in this situation a couple of times. I am aware that refactoring the code would solve the problem: to my credit, in both cases, when I encountered this the channels were provided by libraries, so there wasn't much I could do.

pepijndevos commented 6 years ago

I've ran into the exact problem @CAFxX described, and indeed solved it by completely refactoring everything. But I think it'd be really useful to have completely unbuffered pipelines. May I suggest the following extension to splice:

splice a b func

Which would be of type

chan a, chan b, func(b) a

This essentially imlements a completely unbuffered map operation on channels. Maybe that opens a whole new can of worms about things like filter and reduce, but nonetheless I think it is useful to be able to write pipelines without implicit buffers.

bcmills commented 6 years ago

This essentially [implements] a completely unbuffered map operation on channels.

Polymorphic functions (#15292) seem like a better fit for that than broad built-in operators.

The operators that @robpike proposes interact with channel internals in an interesting way: they potentially allow the runtime to alias buffers between channels.

A splice variant that feeds through a transform function, in contrast, requires an extra allocation for each channel (to store the transformation function) and does not allow repeated operations to collapse: you can't skip any function in the chain. That makes it different enough that I wouldn't want to call it the same operator.

j7b commented 6 years ago

The splice operation is a dangerous notion for a type that's intended to be shared between goroutines (how can c1 become write-only without invalidating declarations?), and generally unnecessary, for example https://play.golang.org/p/YaqdWEwcxQa eliminates Copy by giving rat and channels thereof a common interface. On a related note, the operations on series as implemented don't need a split that spawns goroutines, at most one term of a source series needs to be "buffered" and the reads happen in a well-defined order. The common interface was inspired by a footnote in "Squinting at Power Series" and the inference about terms and order is from same (I feel indebted to the author, his work made objecting to this proposal one of the most challenging (the paper's well above my pay grade) and rewarding experiences I've had as a humble student of computer science).

The clone operation as proposed makes a key property of channels, synchronization, ambiguous. If a shared buffer is what's desired, a channel is a great way to synchronize access to it, as the original implementation of Split does (where the buffer is basically a linked list of goroutine stacks guarded by channels used as mutexes). Arbitrarily large channel capacity and demand-based allocation of channel buffers could also be a useful alternative to clone, https://github.com/golang/go/issues/20352 proposes explicit creation of channels of infinite capacity.

My thanks again to the proposer and for powser[12].go.

networkimprov commented 6 years ago

EDIT: Could clone be called tee? Clone sounds like a variation of make that duplicates the buffer. And it's easily misread as close.

@j7b for "demand-based allocation of channel buffers" see #20868

ianlancetaylor commented 5 years ago

Presumably after executing c1 := clone(c), writing a value to c will block until there is space available for both c and c1.

What is the buffer size of the result of calling clone? Do you get the same buffer size as the argument? Is clone permitted on unbuffered channels? It could still work though you could only write to the original channel when each clone had a goroutine reading from it.

What happens if you write to the channel that is returned by calling clone? What happens if you read from a channel that is passed to splice? Do the operations panic or simply fail in some way?

jdef commented 5 years ago

Similar, but maybe not the exact same problem (re: splice) that's cropped up a few of my projects: I want to select from an arbitrary number of chans. It's actually possible, and ugly, to do this via the reflect package by aggregating SelectCase's and invoking reflect.Select. Did I mention that it's ugly? It would be more natural to express along these lines:

select {
case <-done:
case signal := <-control:
case i, v, ok := <-sliceOfIdenticallyTypedChans:
  // i, v, ok are the same as returned by `reflect.Select`
  // https://golang.org/pkg/reflect/#Select
}

ianlancetaylor commented 5 years ago

@jdef Thanks, but that is really a different problem that should be discussed on a different issue. This one is about splice and clone.