Open robpike opened 6 years ago
Are there any more details on how the clone works? Say that we have the following code:
var parent, cl chan int
go func() {
for {
select {
case i := <-parent:
fmt.Println(i)
}
}
}()
go func() {
for {
select {
case i := <-cl:
fmt.Println("Clone", i)
if i == 3 { return }
}
}
}()
for i := 0; i <5; i++ {
if i == 2 {
cl = clone parent
}
parent <- i
}
Would that print something like:
0
1
2
Clone 2
3
Clone 3
4
I.e. would writing 4
to parent
block since reading from cl
stopped after receiving 3
? How would things change if parent
was buffered, would it matter at all? What if parent
wasn't read from at all, would writing to it block if there are active clones?
Also, I have no idea what splice
does from the description :)
If I read it right, splice c1 c2 makes c1 point to c2 so a write to c1 can be read from c2.
@kardianos
So this is useful for cases where you want to weave two 'streams' into one, as it work. I.e. you read from c2, and write to both c1 and c2 and get the values from both channels while you read, as they are written? Or the reverse of clone
as it were?
edit: discussion moved to #26343
I'd like to add drain(c)
for consideration.
drain(c)
would be roughly equivalent to the current idiom of
go func() {
for range c {
}
}()
except that drain
doesn't create a goroutine: it just has c
discard any receives in the scheduler without blocking the sender and causes other attempts to receive from c
to panic. It's kind of like close
for readers.
The code it replaces follows a fairly simple pattern but, even if the compiler recognized it, it wouldn't be able to perform the same optimization as that relies on the certainty that c
can no longer be read from in another goroutine.
This is less useful than splice
or clone
but this seems like a good place to talk about additional channel primitives that would be more efficient and less fragile as builtins.
Also, if the drained channel was created by clone
it could be removed from the fan out logic since the runtime knows that no one's listening, so it is somewhat more related than it appears.
Would splice c1 c2
require that c1
and c2
have the same element type, or merely compatible ones? Can I splice a chan int64
into a chan time.Duration
? A chan int32
?
I would expect that c1 and c2 would need identical channel types.
@jimmyfrasche When a channel is set to be drained, would that include a special syntax on the write side that would inform the writer that the channel is being drained and no longer processing the data?
if drained := c <- someData; drained {
return
}
Also, if the channel being drained is a clone, would the stream automatically stop sending to it?
Never said this explicitly: It would only be safe to use drain when there was a single reader (possibly of a cloned chan) the same way it's only safe to close a chan when there's a single writer.
@urandom drain(clone(c))
would cancel out. Whatever runtime voodoo handles the fan out necessary to support clone
could immediately drop any drained channel. That's just an optimization, but it seems like a useful/easy one.
I suppose you would need to detect a drained channel. There's something disquieting about making the send statement into an expression, but detection would need to be paired with a send.
If you couldn't detect it, something like
go func() {
for id := 1;; id++ {
c <- id
}
}()
would just spin indefinitely. Though that is the case now if you create a drain goroutine manually. If this example detected a drain, it could just return.
That makes me wonder about something like
for {
select {
case drainedChan <- 1:
case regularChan <- 2:
}
}
drainedChan
can always send but regularChan
might not be able to. To keep the semantics you'd need to hit the drainedChan
case but you may need to prioritize sends to regularChan
since it's the only one doing work outside of the goroutine. In this example, if you could detect that the chan was drained, you could set the chan to nil.
(I'm still waiting for someone smarter than me to explain why this is a terrible idea, though :laughing: )
Channels currently pass data in only one direction. Being able to detect a drained channel would make that bidirectional. It seems simpler and more consistent for a drain to be precisely equivalent to go func() { for _ = range ch { } )()
(aside from not creating a new goroutine).
This has become a discussion about an operation not even mentioned by the proposer. If you want to talk about drain, which is a fine thing to talk about, please open a new issue for that.
@robpike sorry for the hijack. To return to the original proposal, the definition of clone seems to imply infinitely-buffered channels.
I would have expected cloning a channel to behave something more like:
For an unbuffered chan, c
, c <- v
blocks until c
and all clones of c
have received v
.
For a buffered chan, c
, c <- v
blocks if c
or any clones of c
have filled their buffer.
What happens if I do this:
splice(c1, c2)
close(c1)
c2 <- v
Do receives on c2 always succeed instantly getting a zero value unless there's a blocked send on c2?
Does c2 close as well?
Does it take c1 out of the equation leaving c2 as it was before the splice, effectively unsplicing the two?
Does it just panic?
Splice:
The use case of splice
seems to be this: a filter that receives data from c1 and sends some subset of, or transformed, data out to c2. If at some point the filter just wants to passthrough data, doing no filtering, splice c1 c2
would patch through the data. That is anything sent to c1 can be read from c2, allowing the filter goroutine to get out of the way (or terminate). If so, any buffered data in c1 can be copied out to c2 at the time of splice (though that would potentially block splice until there is enough space in c2!).
Clone:
Is c1 := clone c0; c2 := clone c1
equivalent to c1 := clone c0; c2 :- clone c0
? Assuming no intervening reads.
Given a channel with an N item bluffer, the leading item can not be discarded until all the receiver pick it up, right? Would it make sense to add an operation to expand buffering?
This is not strictly a clone operation since not all the functionality is cloned. It is more of a fanout. There can be a corresponding fanin (which sort of overlaps with the splice operation). Conceptually:
cA, Cb, Cc, ... := fanout cR
fanin cA, cB, Cc .. cW
For fanout, cR can be a readonly channel. For fanin, cW can be a write only channel. Fanout from cR means anything written to the other end of cR can be read from cA, cB, cC etc. Fanin to cW means anything written to cA, cB, cC etc. can be read from the other end of cW.
An equally interesting operation would be the inverse of splice
. Suppose the same filter now needs to be reapplied based on some condition (or you want to tap into a channel). I can envision
cIn, cOut := unsplice c
go myFilter(cIn, cOut)
The effect is as follows:
before: producer =>c=> consumer
after: producer =>c ~~ cIn=> filter =>cOut ~~ c=> consumer
The reader and writer of c need not be aware of the existence of the filter tap. Any buffered data in c at the time of unsplice can be read from cIn by the filter (and not by the consumer of c), since a decision to insert a filter can be triggered based on received items from c.
PS:
cIn, cOut := make(chan int, ...)
. Too late now. Another useful function may have been to allow creating bidirectional channels. E.g. c1, c2 := make(chan int, 5, "bidirectional")
so that writes on c1 get read from c2 and vice versa. This would make unsplice even more useful (e.g. inserting an protocol or encryption module based on negotiation).@bakul
[A]ny buffered data in c1 can be copied out to c2 at the time of splice (though that would potentially block splice until there is enough space in c2!).
That does seem reasonable to me and a bit more user-friendly. I have been wondering about that restriction, too. Maybe there's a good reason for it that I'm not seeing.
This is not strictly a clone operation since not all the functionality is cloned.
That's true. Perhaps split
would be a more descriptive name. (And split/splice is a fun word combo.)
fanin/fanout
This could be achieved by multiple calls to splice
/clone
.
An equally interesting operation would be the inverse of splice. Suppose the same filter now needs to be reapplied based on some condition (or you want to tap into a channel).
Could you file a separate proposal for that?
Would've preferred it if these things were possible to implement in user code, rather than build in the language.
The equivalent functionality is possible in user code, but it requires additional channels and goroutines. This proposal would avoid that overhead and make it simpler to build more complex topologies.
Could you file a separate proposal for that?
I am just exploring features that may dovetail with this proposal at the moment.
The equivalent functionality is possible in user code, but it requires additional channels and goroutines.
What I was getting at is to be able to implement something similar in user code without needing additional channels or goroutines. The proposed changes seem non-trivial and are perhaps better suited as a library component of some sort. Plus I would like to be able to avoid context switching overhead for simple but useful operations. For example:
func sum(a, b <-chan int) <-chan int {
c := make(chan int, ...)
go func() for { c <- (<- a) + (<- b) }()
return c
}
[yeah, I know this has problems; the code is just to illustrate the idea]
Conceptually this can be implemented without a goroutine: a receive on c will result in a receive on a nd b and if either blocks, so does the receive on c. The receive returns when both a and b yield a value and a sum can be written into c. Symmetrically, if a value is written into a or b, the sum can be done apriori and written into c. Syntax wise c := a + b
captures this perfectly but I don't expect any Go authors to go for that!
IMHO a pipe is a better abstraction than an iterator in other languages and it would be nice if it is available without needing to use goroutines. Since this proposal started from a suggestion by Doug McIlroy, I couldn't help wondering if we can make Go channels as easy to use as pipes are in a shell!
On the other hand I can't think of a lower level building block to do this as yet.
Regarding splice
-like functionality: I have sometimes encountered the case in which I have two channels, one from which I'm supposed to read and another I'm supposed to write to but -and here's the tricky bit- only when the two operations (receive and send) are not going to block. Something similar to this, but in the following example the send on writeCh
could block so it's not quite what I needed:
for {
select {
// ... other cases ...
case e := <-readCh:
// ...optionally drop/manipulate e here...
writeCh <- e
}
}
There are some ways in which this can be written to satisfy the non-blocking requirements but they tend to get hairy and in any case they (in general) require buffering at least one element (even if both channels are unbuffered) or creating dedicated "channel pump" goroutines.
I am not sure if this is a common enough scenario, but I wanted to submit it for consideration anyway because, as I mentioned above, I found myself in this situation a couple of times. I am aware that refactoring the code would solve the problem: to my credit, in both cases, when I encountered this the channels were provided by libraries, so there wasn't much I could do.
I've ran into the exact problem @CAFxX described, and indeed solved it by completely refactoring everything. But I think it'd be really useful to have completely unbuffered pipelines. May I suggest the following extension to splice:
splice a b func
Which would be of type
chan a, chan b, func(b) a
This essentially imlements a completely unbuffered map
operation on channels. Maybe that opens a whole new can of worms about things like filter
and reduce
, but nonetheless I think it is useful to be able to write pipelines without implicit buffers.
This essentially [implements] a completely unbuffered
map
operation on channels.
Polymorphic functions (#15292) seem like a better fit for that than broad built-in operators.
The operators that @robpike proposes interact with channel internals in an interesting way: they potentially allow the runtime to alias buffers between channels.
A splice
variant that feeds through a transform function, in contrast, requires an extra allocation for each channel (to store the transformation function) and does not allow repeated operations to collapse: you can't skip any function in the chain. That makes it different enough that I wouldn't want to call it the same operator.
The splice operation is a dangerous notion for a type that's intended to be shared between goroutines (how can c1 become write-only without invalidating declarations?), and generally unnecessary, for example https://play.golang.org/p/YaqdWEwcxQa eliminates Copy by giving rat and channels thereof a common interface. On a related note, the operations on series as implemented don't need a split that spawns goroutines, at most one term of a source series needs to be "buffered" and the reads happen in a well-defined order. The common interface was inspired by a footnote in "Squinting at Power Series" and the inference about terms and order is from same (I feel indebted to the author, his work made objecting to this proposal one of the most challenging (the paper's well above my pay grade) and rewarding experiences I've had as a humble student of computer science).
The clone operation as proposed makes a key property of channels, synchronization, ambiguous. If a shared buffer is what's desired, a channel is a great way to synchronize access to it, as the original implementation of Split does (where the buffer is basically a linked list of goroutine stacks guarded by channels used as mutexes). Arbitrarily large channel capacity and demand-based allocation of channel buffers could also be a useful alternative to clone, https://github.com/golang/go/issues/20352 proposes explicit creation of channels of infinite capacity.
My thanks again to the proposer and for powser[12].go.
EDIT:
Could clone
be called tee
? Clone sounds like a variation of make
that duplicates the buffer. And it's easily misread as close
.
@j7b for "demand-based allocation of channel buffers" see #20868
Presumably after executing c1 := clone(c)
, writing a value to c
will block until there is space available for both c
and c1
.
What is the buffer size of the result of calling clone
? Do you get the same buffer size as the argument? Is clone
permitted on unbuffered channels? It could still work though you could only write to the original channel when each clone had a goroutine reading from it.
What happens if you write to the channel that is returned by calling clone
? What happens if you read from a channel that is passed to splice
? Do the operations panic or simply fail in some way?
Similar, but maybe not the exact same problem (re: splice) that's cropped up a few of my projects: I want to select
from an arbitrary number of chans. It's actually possible, and ugly, to do this via the reflect
package by aggregating SelectCase
's and invoking reflect.Select
. Did I mention that it's ugly? It would be more natural to express along these lines:
select {
case <-done:
case signal := <-control:
case i, v, ok := <-sliceOfIdenticallyTypedChans:
// i, v, ok are the same as returned by `reflect.Select`
// https://golang.org/pkg/reflect/#Select
}
@jdef Thanks, but that is really a different problem that should be discussed on a different issue. This one is about splice
and clone
.
splice c1 c2
where channel c1 contains no buffered data, identifies the write end of channel c1 with that of channel c2. Channel c1 becomes write-only.
How does one determine that c1 contains no buffered data? maybe the tieing of write ends should happen after buffered data made it out or be restricted to unbuffered channels (which would be simpler but more limited in applicable use cases).
Also, for use case of temporarily pass-through copying a transformer/filter, one would need to undo the splice somehow.
Also, for use case of temporarily pass-through copying a transformer/filter, one would need to undo the splice somehow.
Actually, one could add
splice c1 c2 N
which would block for N elements passing through and then return with c1
and c2
back to normal.
In Doug's words:
====
splice c1 c2
where channel c1 contains no buffered data, identifies the write end of channel c1 with that of channel c2. Channel c1 becomes write-only.
I find the term splice
a bit off, it's more of a fuse
or tie
. To me, splice
has meaning for magnetic tape, where a piece of one roll of tape is placed into some position of another.
Hi @wsc1, Long before magnetic tape, splice was used for weaving the ends of two ropes together to become one: Splice the main brace! It seems apt, and is in use for a similar operation in Linux's splice(2). https://en.wikipedia.org/wiki/Splice
The splice operation is a dangerous notion for a type that's intended to be shared between goroutines (how can c1 become write-only without invalidating declarations?),
If splice c1 c2 just functions as a return from a goroutine, I don't see how this could be a problem
similarly if splice c1 c2 N blocks.
The clone operation as proposed makes a key property of channels, synchronization, ambiguous.
As initially stated, yes, but I think @jimmyfrasche's take:
I would have expected cloning a channel to behave something more like:
For an unbuffered chan, c, c <- v blocks until c and all clones of c have received v.
For a buffered chan, c, c <- v blocks if c or any clones of c have filled their buffer.
fixes the ambiguity (and potentially infinite buffer size) nicely.
Hi @wsc1, Long before magnetic tape, splice was used for weaving the ends of two ropes together to become one: Splice the main brace! It seems apt, and is in use for a similar operation in Linux's splice(2). https://en.wikipedia.org/wiki/Splice
I had forgotten the nautical sense. Thanks, it does indeed seem like a good name in light of that.
Given that these ideas for splice and clone are in part motivated by stream processing chains, perhaps it makes sense to consider them in conjunction with making channels have an extra parameter to specify a block size in addition to channel capacity? Would others consider this issue an appropriate place to discuss this?
Here are a couple of suggestions made by Doug McIlroy, original author of test/chan/powser[12].go and instigator of pipes in Unix. They are intriguing.
In Doug's words:
====
splice c1 c2
where channel c1 contains no buffered data, identifies the write end of channel c1 with that of channel c2. Channel c1 becomes write-only.clone c
makes a new read-only channel positioned at the same place in the data stream as readable channel c. Thereafter both channels read the same data sequence from the stream at unrelated rates.Splice allows a filter to elide itself from a pipeline when it has no further substantive work to do, rather than going into a copy loop.
Clone enables (buffered) fanout of a data stream. Buffered data items may be garbage-collected when they have been delivered to all readers.
These two capabilities are of general utility in stream processing. golang.org/test/chan/powser1.go is one application that could profitably use them--to eliminate the many instances of Copy and also the per-element go-routines spawned for every Split. Some Unix variants have offered pipe splicing, and fanout is a staple of dataflow algorithms. The current workarounds consume an awful lot of bootless machine cycles.