dotnetcore / CAP

Distributed transaction solution in micro-service base on eventually consistency, also an eventbus with Outbox pattern
http://cap.dotnetcore.xyz
MIT License
6.61k stars 1.28k forks source link

DotNetCore.Cap.NATS do not handle reconnect if the nats server is forcibly shutdown and then restarted #1449

Closed davidterins closed 9 months ago

davidterins commented 9 months ago

Issue: DotNetCore.Cap.NATS do not handle reconnect if the nats server is forcibly shutdown and then restarted

Reproduction steps

  1. Create a subscriber and a publisher using DotNetCore.CAP.NATS and DotNetCore.CAP.InMemoryStorage where the publisher sends messages to a topic with an interval
  2. Run the application and the nats-server
  3. Shut down the nats-server.exe by killing the process... e.g. via windows via TaskManager
  4. Keep the server shut down while some messages are being sent by the publisher.
  5. Start nats-server

Expected behavior here is that the messages sent by the publisher during shutdown of the server should be received by the consumer when nats-server i started again, but they will get stuck in the publishers outbox since the subscriber will not be able to reconnect back on to the serve again

Suggested change

The issue is in the following method DisconnectedEventHandler in NATSConsumerClient.cs handling the disconnected events of the nats server.

Current implementation

        private void ClosedEventHandler(object? sender, ConnEventArgs e)
        {
            if (e.Error is { Message: "Server closed the connection." })
            {
                var logArgs = new LogMessageEventArgs
                {
                    LogType = MqLogType.ConnectError,
                    Reason = e.Error.ToString()
                };
                OnLogCallback!(logArgs);
            }
        }

New implementation

        private void ClosedEventHandler(object? sender, ConnEventArgs e)
        {
            if (e.Error is null) return

            var logArgs = new LogMessageEventArgs
            {
                 LogType = MqLogType.ConnectError,
                 Reason = e.Error.ToString()
             };
             OnLogCallback!(logArgs);
        }

I have noticed that this ClosedEventHandler is triggered in a lot of cases where the e.Error is null and I assume that is why the explicit message check was set to if (e.Error is { Message: "Server closed the connection." }) in commit c63c2891, however this message do not seem to cover all the server disconnect cases. Therefore I suggest to just return when e.Error is null rather than adding another explicing message check for "Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..", which is sent when nats server is shutdown forcibly.

yang-xiaodong commented 9 months ago

I have noticed that this ClosedEventHandler is triggered in a lot of cases where the e.Error is null and I assume that is why the explicit message check was set to if (e.Error is { Message: "Server closed the connection." }) in commit https://github.com/dotnetcore/CAP/commit/c63c28916af2c5a03d33aef951ff99e2618e341e,

You're right, there could be a number of reasons for this. I'm not sure if the changes you made will affect the error being null and the connection still being present.

I just looked at the NATS.Client repo code to see if there were any possible error messages, and I found that they now have a new V2 version of the client library.

https://github.com/nats-io/nats.net.v2

If you're interested in switching to the V2 version of NATS.Client.JetStream, please let me know.

davidterins commented 9 months ago

Looks legit, I would be up for trying it out if it was implemented on a feature branch. 👍

Another questions... Is there a plan on when there will be a new official release of the CAP nugets? (without updating to nats v2)

yang-xiaodong commented 9 months ago

The current version has reached 8.0 and is currently in the preview version. If you need it, I can release a preview version of 8.0 for you. The official version may take another 1-2 weeks, I'm no sure.

davidterins commented 9 months ago

I see. yes, a preview version would be great for now.

yang-xiaodong commented 9 months ago

Version 8.0.0-preview-218723843 is released to nuget, please check it in a few minutes.

davidterins commented 9 months ago

Great, thank you!

yang-xiaodong commented 9 months ago

Fixed in version 8.0.0