cyanfish / grpc-dotnet-namedpipes

Named pipe transport for gRPC in C#/.NET
Apache License 2.0
183 stars 47 forks source link

ServerStreamPool.Start - NamedPipeServerStream instances might not be created when the method exits #44

Closed pdupont1 closed 6 months ago

pdupont1 commented 1 year ago

Having an issue in my automated tests when calling NamedPipeServer.Start and directly afterwards calling an RPC method on my client. Sometimes the client fails to connect with an RpcException Status(StatusCode="Unavailable", Detail="failed to connect to all addresses"). I think the cause for this is that ServerStreamPool is starting background threads to create servers and the servers might not be created and listening for connections when I would expect them to (After Start() exits).

I fixed it for now by adding a counter to the ServerStreamPool.WaitForConnection method:

private void WaitForConnection(NamedPipeServerStream pipeServer)
{
    ++_serversWaiting;
    try
    {
        pipeServer.WaitForConnectionAsync(_cts.Token).Wait();
    }
    catch (Exception)
    {
        try
        {
            pipeServer.Disconnect();
        }
        catch (Exception)
        {
            // Ignore disconnection errors
        }
        pipeServer.Dispose();
        throw;
    }
    finally
    {
        --_serversWaiting;
    }
}

and changing the ServerStreamPool.Start() method to this:

public async Task StartAsync()
{
    if (_stopped)
    {
        throw new InvalidOperationException(
            "The server has been killed and can't be restarted. Create a new server if needed.");
    }
    if (_started)
    {
        return;
    }

    for (int i = 0; i < PoolSize; i++)
    {
        StartListenThread();
    }

   using (CancellationTokenSource waitServersToken = new CancellationTokenSource(TimeSpan.FromSeconds(10)))
    {
        while (_serversWaiting < PoolSize && !waitServersToken.IsCancellationRequested)
        {
            try
            {
                await Task.Delay(5, waitServersToken.Token);
            }
            catch (TaskCanceledException) { }
        }

        if(waitServersToken.IsCancellationRequested)
        {
            throw new TimeoutException("Timed out waiting for servers to start.");
        }
    }

    _started = true;
}

The workaround seems to have fixed the issue for my use case.

cyanfish commented 12 months ago

I'd be interested to see a test case if you have one that can reproduce it. The client is supposed to wait for the timeout before failing. I do feel like I've seen that fail in the past but I haven't had any luck reproducing it lately.

cyanfish commented 6 months ago

I've added some custom retry logic in 3.0.0 that I believe should solve issues like this.