FDio / govpp

Go toolset for the VPP.
Apache License 2.0
192 stars 81 forks source link

fix handling of EAGAIN reply return value #144

Closed fstreun closed 1 year ago

fstreun commented 1 year ago

The new VPP binapi for streams can return a reply with the return value EAGAIN. This value signals that VPP suspends/ends the transmission (e.g., because the call has taken too long and possibly blocks the master thread). The client is then expected to repeat the request, optimally providing the cursor to the last received data. So far, the implementation only returned an error without providing the cursor to the last received data point (in older versions, e.g., v0.4.0, govpp does not return an error and just accepts the incomplete response).

Now, when receiving a reply message the stream is closed and the reply message is returned. The caller can then check for the return value in the reply message and react appropriately.

Furthermore, new error codes are added.

ondrej-fabry commented 1 year ago

Hey @fstreun, thank you for your PR. I certainly see the problem with the generated client code for this type of VPP API calls, but I have few comments on the implementation in this PR and also a proposal for alternative solution.

Current Implementation of RPC client

From my quick investigation, it seems the current implementation of the generated RPC client was not even done in the same manner that simple VPP API calls (request-reply) are handled.

Request-Reply Calls

For example, in the code sample below, after a call to Invoke (which does SendMsg and RecvMsg) the reply message is always returned no matter what the value of Retval is.

func (c *serviceClient) CreateLoopback(ctx context.Context, in *CreateLoopback) (*CreateLoopbackReply, error) {
    out := new(CreateLoopbackReply)
    err := c.conn.Invoke(ctx, in, out)
    if err != nil {
        return nil, err
    }
    return out, api.RetvalToVPPApiError(out.Retval)
}

Dump-Details Calls

This is definitely not the case for the stream VPP API calls. The dump-details VPP API calls do not even check the Retval value of ControlPingReply at all, but this might have no significance at all:

func (c *serviceClient_SwInterfaceDumpClient) Recv() (*SwInterfaceDetails, error) {
    msg, err := c.Stream.RecvMsg()
    if err != nil {
        return nil, err
    }
    switch m := msg.(type) {
    case *SwInterfaceDetails:
        return m, nil
    case *memclnt.ControlPingReply:
        // <-- m.Retval value is not checked..
        err = c.Stream.Close()
        if err != nil {
            return nil, err
        }
        return nil, io.EOF
    default:
        return nil, fmt.Errorf("unexpected message: %T %v", m, m)
    }
}

Dump-Details-Reply Calls

And the dump-details-reply VPP API call, which this PR aims to fix, will discard the reply message in case of non-zero Retval value:

func (c *serviceClient_SwInterfaceTxPlacementGetClient) Recv() (*SwInterfaceTxPlacementDetails, *SwInterfaceTxPlacementGetReply, error) {
    msg, err := c.Stream.RecvMsg()
    if err != nil {
        return nil, nil, err
    }
    switch m := msg.(type) {
    case *SwInterfaceTxPlacementDetails:
        return m, nil, nil
    case *SwInterfaceTxPlacementGetReply:
        if err := api.RetvalToVPPApiError(m.Retval); err != nil {
            return nil, nil, err // <-- reply (second return value) returned as nil
        }
        err = c.Stream.Close()
        if err != nil {
            return nil, nil, err // <-- same happens here as well
        }
        return nil, m, io.EOF
    default:
        return nil, nil, fmt.Errorf("unexpected message: %T %v", m, m)
    }
}

Converting Retval to error

The api.RetvalToVPPApiError is a helper function for converting Retval values to error type. It actually returns api.VPPApiError that implements error interface with underlying type int32. The api.RetvalToVPPApiError will return nil error for zero Retval value otherwise it will return api.VPPApiError that is same value as original Retval field.

By converting non-zero Retval values to error and returning it along with any reply message, there is no need for users to do additional check of Retval value after checking the return error (less of error-prone boilerplate code needed) and it is also more obvious to users what the Retval value actually means (prints text representation of retval error codes).

Your Implementation

However, your implementation in this PR has completely removed this conversion behavior. This would force users to check both the returned error and also the value of Retval field. Here is short code sample of what users would have to do:

d, r, err := c.Recv()
if err == io.EOF {
    // handle final reply
} else if err != nil {
    // handle error
} else if r != nil && r.Retval != 0 {
    // handle non-zero retval
    if r.Retval == api.EAGAIN {
        // handle suspended stream
    } else { 
        // handle other retval errors
    }
}

Alternative Solution

I propose an alternative solution for this issue. Instead of removing the Retval field conversion from the RPC client code..

diff --git a/binapi/interface/interface_rpc.ba.go b/binapi/interface/interface_rpc.ba.go
index 8103d7cb..187d8794 100644
--- a/binapi/interface/interface_rpc.ba.go
+++ b/binapi/interface/interface_rpc.ba.go
@@ -423,12 +423,9 @@ func (c *serviceClient_SwInterfaceTxPlacementGetClient) Recv() (*SwInterfaceTxPl
    case *SwInterfaceTxPlacementDetails:
        return m, nil, nil
    case *SwInterfaceTxPlacementGetReply:
-       if err := api.RetvalToVPPApiError(m.Retval); err != nil {
-           return nil, nil, err
-       }
        err = c.Stream.Close()
        if err != nil {
-           return nil, nil, err
+           return nil, m, err
        }
        return nil, m, io.EOF
    default:

..which would make the usage less consistent with other VPP API calls, the reply message could be returned in all cases:

diff --git a/binapi/interface/interface_rpc.ba.go b/binapi/interface/interface_rpc.ba.go
index 8103d7cb..187d8794 100644
--- a/binapi/interface/interface_rpc.ba.go
+++ b/binapi/interface/interface_rpc.ba.go
@@ -423,12 +423,9 @@ func (c *serviceClient_SwInterfaceTxPlacementGetClient) Recv() (*SwInterfaceTxPl
    case *SwInterfaceTxPlacementDetails:
        return m, nil, nil
    case *SwInterfaceTxPlacementGetReply:
        if err := api.RetvalToVPPApiError(m.Retval); err != nil {
-           return nil, nil, err
+           return nil, m, err
        }
        err = c.Stream.Close()
        if err != nil {
-           return nil, nil, err
+           return nil, m, err
        }
        return nil, m, io.EOF
    default:

This way the user does not need to check the value of Retval field at all and only needs to check if the returned error is equal to api.EAGAIN:

d, r, err := c.Recv()
if err == io.EOF {
    // handle final reply
} else if err == api.EAGAIN {
    // handle suspended stream
} else if err != nil {
    // handle error
}

Let me know what you think.

ondrej-fabry commented 1 year ago

@fstreun This fix has been scheduled for upcoming release v0.8.0 and currently is the last outstanding change we want to include. So, to not keep the release blocked and in case you do not have time to update this PR until Monday, I will push the propose alternative solution to this PR.

ondrej-fabry commented 1 year ago

@fstreun This fix has been scheduled for upcoming release v0.8.0 and currently is the last outstanding change we want to include. So, to not keep the release blocked and in case you do not have time to update this PR until Monday, I will push the propose alternative solution to this PR.

This PR has been replaced by: https://github.com/FDio/govpp/pull/147

fstreun commented 1 year ago

@ondrej-fabry excuse the inactivity on this PR (I was on vacation). I have to remarks/questions.

Your second alternative seems also to be suitable. A complete example, including the actual handling of the EAGAIN, would be useful to see how many checks the user really has to do. (Should the user assume that reply r is non-nil if EAGAIN is returned as error?)

In the alternative solution, the stream is not closed if a return value is received. Is there any motivation for that (why not always close the stream)? When receiving EAGAIN, the user would repeat the request, which includes creating a new stream (NewStream). Would this lead to many open streams? Or am I missing something here?

ondrej-fabry commented 1 year ago

Your second alternative seems also to be suitable. A complete example, including the actual handling of the EAGAIN, would be useful to see how many checks the user really has to do. (Should the user assume that reply r is non-nil if EAGAIN is returned as error?)

I will update code in examples to demonstrate this and also add the relevant info to docs (user guide / troubleshooting).

In the alternative solution, the stream is not closed if a return value is received. Is there any motivation for that (why not always close the stream)? When receiving EAGAIN, the user would repeat the request, which includes creating a new stream (NewStream). Would this lead to many open streams? Or am I missing something here?

You are correct, I have noticed this as well, but forgot to fix it actually. Thanks for pointing it out. I will push a fix.