dedis / onet

Overlay Network for distributed protocols
GNU Lesser General Public License v3.0
50 stars 29 forks source link

Make SendToChildrenInParallel asynchronous #682

Open ineiti opened 2 years ago

ineiti commented 2 years ago

The current SendToChildrenInParallel is synchronous: it waits for each children to receive the message OR to return an error. Unfortunately some connections only return an error after a timeout. For the blscosi protocol, this timeout is too long for the protocol to work correctly.

So SendToChildrenInParallel should return immediately and still allow to be notified:

@tharvik proposed the following code:

    ret := make(chan error)
    childsRets := make(chan error)
    go func() {
        for range p.Children() {
            if err := <-childsRets; err != nil {
                ret <- xerrors.Errorf("send to children: %v", err)
            }
        }
        close(ret)
    }()
    for _, c := range p.Children() {
        go func(tn *onet.TreeNode) {
            childsRets <- p.SendTo(tn, msg)
        }(c)
    }
    return ret

As this is a breaking change, one could add the following to SendToChildrenInParallel:

Deprecated: this is synchronous with eventual connection errors, so please use `SendToChildrenInBackground`.

Naming propositions (add yours):

Once this is merged, the calls to SendToChildrenInParallel should be replaced with this one. Also the hacky ones in the cothority/blscosi/protocol/sub_protocol.go and cothority/messaging/propagate.go.

nkcr commented 2 years ago

With the proposed solution we are open to a zombie goroutine in case the caller is not fully reading the channel. I see two solutions:

  1. Concatenate the errors, and return a channel with one error, if any
  2. Return a buffered chan so that we are not concerned about blocking when filling the chan and avoid a loop

Solution 2 seems easier:

    ret := make(chan error, len(p.Children))

    wait := sync.WaitGroup{}
    wait.Add(len(p.Children))

    for _, c := range p.Children() {
        go func(tn *onet.TreeNode) {
            defer wait.Done()
            ret <- p.SendTo(tn, msg)
        }(c)
    }

    go func() {
        wait.Wait()
        close(ret)
    }()

    return ret

Then the caller can do:


ret := SendToChildrenInBackground(...)
for err, more := range ret {
    if more {
        // do something with err
    }
}
ineiti commented 2 years ago

@tharvik what do you think of @nkcr 's proposal?

tharvik commented 2 years ago

Return a buffered chan so that we are not concerned about blocking when filling the chan and avoid a loop

Right, good idea, that do avoid having non-terminating goroutine.

I would avoid sending nil error on the channel

func SendToChildrenInBackground(msg interface{}) <-chan error {
    ret := make(chan error, len(p.Children()))

    var wait sync.WaitGroup
    wait.Add(len(p.Children()))

    for _, c := range p.Children() {
        go func(tn *onet.TreeNode) {
            defer wait.Done()
            if err := p.SendTo(tn, msg); err != nil {
                ret <- err
            }
        }(c)
    }

    go func() {
        wait.Wait()
        close(ret)
    }()

    return ret
}

This way, the caller can simply

for err := range SendToChildrenInBackground(...) {
    // some error happened
}
ineiti commented 2 years ago

The nice thing about having a potential nil in the channel is that the caller can also count how many successful connections have been made. So in case he needs t out of n successful connections to continue, the caller can simply count the nils and then go on, dropping the rest of the results. That's something I think might be used in some cases.

tharvik commented 2 years ago

Alright, let's add the next helper then, to easily choose between the two behaviors

func FilterNilError(recv <-chan error) <-chan error {
    ret := make(chan error) // should we use cap(recv) here?

    go func() {
        for err := range recv {
            if err != nil {
                ret <- err
            }
        }
        close(ret)
    }()

    return ret
}
ineiti commented 2 years ago

That makes the main function quite small:

func (n *TreeNodeInstance) SendToChildrenInBackground(
    msg interface{}) <-chan error {
    ret := make(chan error, len(n.Children()))

    for _, c := range n.Children() {
        go func(tn *TreeNode) {
            ret <- n.SendTo(tn, msg)
        }(c)
    }

    return ret
}

Or we make a SendToChildrenInBackgroundErrorOnly method (better name needed) that calls the SendToChildrenInBackground.

tharvik commented 2 years ago

That makes the main function quite small: ...

No, it needs the WaitGroup, because we really have to close the channel, otherwise, we can't range on it.

Or we make a SendToChildrenInBackgroundErrorOnly method (better name needed) that calls the SendToChildrenInBackground.

Having a helper allows us to reuse this pattern without linking it to SendToChildrenInBackground.

And to come back to the need to have nil errors: that's quite a specific use case, do you have anywhere it's needed now? We can always add it latter should the need arise.

ineiti commented 2 years ago

I thought of using it in a

childrenErrors := onet.SendToChildrenInBackground(...)
for {
   select {
   case err := <- childrenErrors:
   case other := <- otherStuff:
   }
}

If I remember correctly, closing the channel would always choose the first case, correct? And the case other would never be called?

If that is true, then I prefer not to have the close in the method. Because that's how it's used in the blscosi protocol.

nkcr commented 2 years ago

If I remember correctly, closing the channel would always choose the first case

No, there are absolutely no priority in select statement. Use default if you want that.

I seems you're trying to be too smart there. The function should have a simple behavior: return result of sends on a channel and close it when its done. period. Let the caller implement its own utility function if needed.

ineiti commented 2 years ago

I like being smart...

Anyway - if you both think the channel should be closed, I'll add some additional logic in the 2 places (out of 6) where this is needed.

And for the nil use-case, it could be used in the propagate-protocol. But I'm not sure it's a good thing to do there.

Now I just need an easy way to write the test for this and some time...