creachadair / jrpc2

A Go implementation of a JSON-RPC 2.0 server and client.
BSD 3-Clause "New" or "Revised" License
62 stars 10 forks source link

http client doesn't retry after error #118

Open 2opremio opened 3 months ago

2opremio commented 3 months ago

I am using jrpc2 v1.2.0

This test:

package main

import (
    "context"
    "fmt"
    "net/http"
    "testing"

    "github.com/creachadair/jrpc2"
    "github.com/creachadair/jrpc2/handler"
    "github.com/creachadair/jrpc2/jhttp"
    "github.com/stretchr/testify/require"
)

func TestClientReconnectsIfConnectionRefused(t *testing.T) {
    ch := jhttp.NewChannel("http://localhost:8000", nil)
    client := jrpc2.NewClient(ch, nil)
    defer client.Close()

    var result string
    err := client.CallResult(context.Background(), "Hello", nil, &result)
    require.Error(t, err)

    b := jhttp.NewBridge(handler.Map{
        "Hello": handler.New(func(context.Context) (string, error) {
            return "Hello, world!", nil
        }),
    }, nil)
    defer b.Close()
    go http.ListenAndServe("localhost:8000", b)

    err = client.CallResult(context.Background(), "Hello", nil, &result)
    require.NoError(t, err)
    fmt.Println(result)
}

fails with:

=== RUN   TestClientReconnectsIfConnectionRefused
            Error:          Received unexpected error:
                            Post "http://localhost:8000": dial tcp [::1]:8000: connect: connection refused
            Test:           TestClientReconnectsIfConnectionRefused
--- FAIL: TestClientReconnectsIfConnectionRefused (0.00s)

FAIL

It works if I recreate the after the server, i.e.:

func TestClientReconnectsIfConnectionRefused(t *testing.T) {
    ch := jhttp.NewChannel("http://localhost:8000", nil)
    client := jrpc2.NewClient(ch, nil)
    defer client.Close()

    var result string
    err := client.CallResult(context.Background(), "Hello", nil, &result)
    require.Error(t, err)

    b := jhttp.NewBridge(handler.Map{
        "Hello": handler.New(func(context.Context) (string, error) {
            return "Hello, world!", nil
        }),
    }, nil)
    defer b.Close()
    go http.ListenAndServe("localhost:8000", b)

    ch2 := jhttp.NewChannel("http://localhost:8000", nil)
    client2 := jrpc2.NewClient(ch2, nil)
    defer client2.Close()
    err = client2.CallResult(context.Background(), "Hello", nil, &result)
    require.NoError(t, err)
    fmt.Println(result)
}

I don't know if it's expected behavior, but I was expecting the client to try again after getting a connection-refused error (Just like Go's http client).

I spent hours debugging what the problem was, see https://github.com/stellar/soroban-rpc/pull/207

creachadair commented 3 months ago

It isn't necessary to create a new channel, the problem you're seeing arises in the jrpc2.Client: It receives an error trying to read from the channel so it closes the client, and that is a persistent status.

It should also work to say, for example:

client2 := jrpc2.NewClient(ch, nil)  // N.B. reusing the same channel

The jhttp.Channel is somewhat limited in what it can do to report errors: It can't[1] block on sending the request, because that will not return until the whole request is complete. So instead it reports all errors via its Recv, with the side effect you noted for the Client.

Usually when a stream breaks, it's not recoverable—and the Client doesn't have any deeper knowledge of its Channel so it can't tell that in this one case it would be OK to keep trying. For a socket, however, that would lead to the client looping forever reading EOF or EBADF or whatever the underlying transport says when it's closed/invalid.

One way to work around this might be to wrap the client rather than the channel, e.g.,

// Warning: schematic, untested

type Client struct{
   ch *jhttp.Channel
   cli *jrpc2.Client
   opts *jrpc2.ClientOptions
}

func NewClient(ch *jhttp.Channel, opts *jrpc2.ClientOptions) *Client {
   return &Client{ch: ch, cli: jrpc2.NewClient(ch, opts), opts: opts}
}

func (c *Client) CallResult(ctx context.Context, method string, params, result any) error {
   for ctx.Err() == nil {
      err := c.cli.CallResult(ctx, method, params, result)
      if err == nil {
         return nil
      } else if !isRetryable(err) {
         return err
      }
      backoffForRetry()
      // Maybe also: limit retry count

      c.cli = jrpc2.NewClient(c.ch, c.opts)
   }
   return ctx.Err()
}

or words to that effect. (You might also need a lock if you want it to be usable concurrently, and so on).

[1]: Or at least, not using the standard library's HTTP client. And reimplementing all of HTTP is out of scope here 🙂

2opremio commented 3 months ago

I see, thanks. I was expecting the client to work like Go's http client (which doesn't bail out if it cannot connect)

creachadair commented 3 months ago

I see, thanks. I was expecting the client to work like Go's http client (which doesn't bail out if it cannot connect)

Unfortunately not, the client assumes a channel that's already connected, so there's no separate dial phase the client could use to detect that.