Azure / DotNetty

DotNetty project – a port of netty, event-driven asynchronous network application framework
Other
4.09k stars 977 forks source link

Reconnect client after connection was closed #245

Open robertmircea opened 7 years ago

robertmircea commented 7 years ago

What is the best (or recommended way) to reconnect to a remote TCP endpoint after connection was lost? I need to reconnect with specific delay (exponential or fixed).

nayato commented 7 years ago

that's out of scope for dotnetty. You'd need to place a channel handler in pipeline for the channel that handles ExceptionCaught and ChannelInactive and react to these methods being called. You'd make a decision on whether to reconnect based on evidence (was the end erroneous, or caused by user's call to close the channel for instance) and then start another channel.

robertmircea commented 7 years ago

Thank you, I understand the principle, but what are the actual steps to reconnect? Do I need to create a new Boostrap() instance when triggering a new ConnectAsync? What happens if bootstrap.ConnectAsync(...) call cannot resolve host or times out connecting?

A small sample which handles efficiently the reconnect would be appreciated...

nayato commented 7 years ago

@robertmircea Bootstrap can be used to open multiple connections, just call bootstrap.ConnectAsync to create a new channel. Any error during the ConnectAsync call will come through as a faulted task. Unfortunately I really have no time to write down a sample. You basically need to monitor ChannelInactive in a channel pipeline (write a Channel Handler for this) and when it's triggered, connect using bootstrap again. How to handle failures to connect is up to you, retry policy with back-off is a common way of doing things.

battermaster commented 6 years ago

I test bootstrap.ConnectAsync, when the connection was lost, It cannot reconnect anymore.

yindongfei commented 6 years ago

Yes, I got same problem. when I create a bootstrap inside ChannelInactive, It cannot reconnect anymore until I restart the server.

lwg-galuise commented 6 years ago

Hello,

I too am receiving the same kind of behavior / issue from a "forced disconnect." The MatriX.vnext library depends on DotNetty and in testing an XMPP client (to be encapsulated in a .NET 4.6.2 Windows Service.) I wanted to simulate a "network failure" / disconnect. Upon pulling the network cable from my machine I notice it does take quite a while for DotNetty to recognize that connectivity has been lost (about 30 seconds or more as mentioned in issue #238 ).

That said I too am experiencing a "hang" when Bootstrap.ConnectAsync is called. Taking your suggestion I created a Channel Handler to capture both the ExceptionCaught and the ChannelInactive events:

private class MyClientChannelHandler : ChannelHandlerAdapter
{
    public override void ExceptionCaught(
            IChannelHandlerContext context, Exception exception)
    {
        base.ExceptionCaught(context, exception);
        log.Error(exception,"Channel exception caught: " +
            exception.Message);
    }

    public override void ChannelInactive(IChannelHandlerContext context)
    {
        log.Error("Caught channel inactive!");
        base.ChannelInactive(context);
    }
}

And while that ChannelHandler does appear to capture the "forcibly closed" exception that occurs due to the network cable being pulled nothing further is reported upon any reconnect attempts. Therfore it doesn't appear that your ChannelHandler solution is the solution to this issue.

This seems to be a real issue with trying to deal with any kind of adverse network conditions.

I even tried doing the following within the CatchException method (to no avail):

private class MyClientChannelHandler : ChannelHandlerAdapter
{
    public override void ExceptionCaught(
            IChannelHandlerContext context, Exception exception)
    {
        base.ExceptionCaught(context, exception);
        log.Error(exception,"Channel exception caught: " +
            exception.Message);
        context.Channel.ConnectAsync(
            context.Channel.RemoteAddress).Wait(); // attempt to re-connect

    }

    public override void ChannelInactive(IChannelHandlerContext context)
    {
        log.Error("Caught channel inactive!");
        base.ChannelInactive(context);
    }
}

Kind of at a loss at this point as nothing I do seems to handle the disconnect and leaves the application kind of "dead in the water." Forgive me if I have missed something obvious. I've tried looking through DotNetty's source a bit to see if I'm missing something simple, but nothing obvious comes up. Any help that can be offered would be appreated greatly. Thanks.

mariomeyrelles commented 6 years ago

To be very honest, I'm in the same situation. When the servers closes the connection and shortly opens it again, the client is not able connect again. I tried to intercept either ExceptionCaught or ChannelInactive. I tried to reconnect using this code:


        public override void ChannelInactive(IChannelHandlerContext context)
        {
            base.ChannelInactive(context);
            context.ConnectAsync(context.Channel.RemoteAddress).Wait() ;

        }

Doing so, I came across with the following exception, inside an outer AggregationException:


DotNetty.Transport.Channels.ClosedChannelException: I/O error occurred.}
    Data: {System.Collections.ListDictionaryInternal}
    HResult: -2146232800
    HelpLink: null
    InnerException: null
    Message: "I/O error occurred."
    Source: null
    StackTrace: null
    TargetSite: null

I also tried the dirty way of doing a complete bootstrap from inside the handler:


       public override void ChannelInactive(IChannelHandlerContext context)
       {

            base.ChannelInactive(context);

            Console.ForegroundColor = ConsoleColor.DarkCyan;
            Console.WriteLine("[ConnectionWatcherHandler] Canal Inativo. " + context);
            Console.ResetColor();

            // context.ConnectAsync(context.Channel.RemoteAddress).Wait() ; // doesn't work

            var group = new MultithreadEventLoopGroup(1);
            var bootstrap = new Bootstrap();
            bootstrap
                .Group(group)
                .Channel<TcpSocketChannel>()
                .Option(ChannelOption.TcpNodelay, true)
                .Handler(new ActionChannelInitializer<ISocketChannel>(channel =>
                {
                    IChannelPipeline pipeline = channel.Pipeline;

                        pipeline.AddLast("watcher", new ConnectionWatcherHandler());
                        pipeline.AddLast("decoder", new LengthFieldBasedFrameDecoder(ByteOrder.LittleEndian, ushort.MaxValue, 0, 4, 0, 4, true));
                        pipeline.AddLast("slt", new SltMessageHandler());
                }));

            IChannel channel2 = bootstrap.ConnectAsync(new IPAddress(new byte[] { 127, 0, 0, 1 }), 9999).GetAwaiter().GetResult();

            Console.WriteLine(channel2);

        }

When I tried this, I didn't get any error or response. It keeps waiting forever. I was only able to make this work when I downloaded the project, referenced it and debugged step by step today. It seems that when I do things "slower", step by step, when debugging bootstrap.ConnectAsync, things start to synchronize and work again.

I'm stuck.

0Lucifer0 commented 6 years ago

putting the reconnection step inside a task will work. tried like this and it works

  public override void ChannelInactive(IChannelHandlerContext context)
        {
            Logger.Log.Warn(string.Format(LogLanguage.Instance.GetMessageFromKey(LanguageKey.UNREGISTRED_FROM_MASTER)));
            Task.Run(() => _onConnectionLost());
        }
kuzmenko-oleksandr-km commented 6 years ago

@0Lucifer0 does it work okay? Does it leak memory, or any other issues with such approach? Feels really hacky to be perfectly honest, wonder what's the right way to do it. :(

0Lucifer0 commented 6 years ago

I guess it is. Doesn’t seems to leak memory but yes it’s hacky and definitely not what we should use. I just put this comment because it’s the only way that work for me and it seems people are facing the same issue so in waiting for a fix of a better way

tjakopan commented 6 years ago

I believe I'm having the same issue. I'm trying to implement "uptime" example from netty (Uptime ‐ implement automatic reconnection mechanism) - https://github.com/netty/netty/tree/4.1/example/src/main/java/io/netty/example/uptime

On reconnection try I'm getting the following error:

System.AggregateException: One or more errors occurred. ---> DotNetty.Transport.Channels.ClosedChannelException: I/O error occurred. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at DotNetty.Transport.Bootstrapping.Bootstrap.d15.MoveNext() --- End of inner exception stack trace --- ---> (Inner Exception #0) DotNetty.Transport.Channels.ClosedChannelException: I/O error occurred. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at DotNetty.Transport.Bootstrapping.Bootstrap.d15.MoveNext()<---

The full example is here - https://github.com/tjakopan/DotNettyExamples Projects under the solution are Uptime.Server and Uptime.Client. Framework used is .NET v4.5.

tjakopan commented 6 years ago

Tried running the example in Visual Studio and figured out what the problem was. ch.Pipeline.AddLast(new IdleStateHandler(ReadTimeout, TimeSpan.Zero, TimeSpan.Zero), Handler))); On this line, VS displayed and exception that handler is not sharable. Adding public override bool IsSharable => true; to the handler fixed the issue.

JetBrains Rider did not show that exception.

wayne2006 commented 5 years ago

I found this problem is caused by the Handler is not released, you can comment out the Handler try, bootstrap.ConnectAsync can run normally.

hanxinimm commented 5 years ago

Is there no better way to solve this problem?

nayato commented 5 years ago

I'm not sure you got the idea right. Channel is not re-usable. Once it's closed, you should not try to revive it. If you need to re-connect the steps should be like this:

You can also schedule continuation for channel's CloseCompletion (https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Transport/Channels/IChannel.cs#L46) instead of implementing channel handler. Note that it will always get completed successfully regardless of the cause of channel closure.

hanxinimm commented 5 years ago

@nayato have you ever tried? image

image

connectAsync hangs!

if you need ,i can upload all source code!

caozhiyuan commented 5 years ago

Refer to this : https://github.com/caozhiyuan/DotNetty/blob/dev/src/DotNetty.Rpc/Client/ReconnectHandler.cs

yyjdelete commented 5 years ago

@hanxinimm Some extra information.

  1. You should always reuse the MultithreadEventLoopGroup if you don't shutdown it, or you can just reuse the whole Bootstrap. https://github.com/caozhiyuan/DotNetty/blob/dev/src/DotNetty.Rpc/Client/NettyClient.cs#L22

  2. Check whether IsSharable of your STANEncoder/Decoder is true (@nayato In fact, I think it's really hard to see whether an handler is sharable or not, without see the source code nor execute it. Unlike netty, DotNetty doesn't use Attrubute to mark this.)

hanxinimm commented 5 years ago

@yyjdelete image

image

@caozhiyuan

image

ex = {DotNetty.Transport.Channels.ClosedChannelException: I/O error occurred. at DotNetty.Transport.Bootstrapping.Bootstrap.DoResolveAndConnectAsync(EndPoint remoteAddress, EndPoint localAddress) at Hunter.STAN.Client.STANClient.ReconnectIfNeedAsync(EndP...

xhydongda commented 1 year ago

Using dotnetty 0.6.0 or 0.7.5 in .net framework4.6.1 or 4.7.2, I can auto reconnect like this:

try
{
    IChannel clientChannel = await bootstrap.ConnectAsync(serverEndPoint);
    if (clientChannel.Open)
    {
        channel = clientChannel;
        _ = clientChannel.CloseCompletion.ContinueWith((t, s) =>
        {
            scheduleReconnect();//auto reconnect when channel closed.
        }, this, TaskContinuationOptions.ExecuteSynchronously);
    }
    else
    {
        Logger.Warning($"clientChannel not open, retry after {reconnectDelay.Seconds} s", loggerSource);
        scheduleReconnect();//auto reconnect when connect failed.
    }
}
catch(Exception ex)
{
    Logger.Warning($"can't connect to {serverIp} port:{serverPort}: {ex.Message}, retry after {reconnectDelay.Seconds} s", loggerSource); 
    scheduleReconnect(); //auto reconnect when connect error.
}

private void scheduleReconnect()
{
    if (!disconnected)
    {
        eventLoopGroup.Schedule(async () => await connectAsync(), reconnectDelay);
    }
}

I also met I/O Error Occured, but the problem was caused by my handler ChannelActive code, there I wanted to reuse hello message buffer like this:

ctor()
{
    helloBuffer = Unpooled.WrappedBuffer(helloMsg);// helloMsg doesn't change
}
public override void ChannelActive(IChannelHandlerContext ctx)
{
    this.ctx = ctx;
    if (helloBuffer != null)
    {
        ctx.WriteAndFlushAsync(helloBuffer);
    }
}

It works fine as .net6 class library, or console app, but not as .net framework library: helloBuffer is freeed after reconnect, and I get I/O Error。

So, for me the reconnect works fine, but be careful when reusing IByteBuffer.

liu3104009029 commented 1 year ago

@xhydongda 能否贴上完整的代码