dotnetcore / CAP

Distributed transaction solution in micro-service base on eventually consistency, also an eventbus with Outbox pattern
http://cap.dotnetcore.xyz
MIT License
6.61k stars 1.28k forks source link

NATS Connection Error after Restarting nats-server: Unable to Publish Messages #1542

Closed yang-xiaodong closed 2 months ago

yang-xiaodong commented 3 months ago
          Hello @yang-xiaodong!

Can this be an issue for NATS as well? I'm investigating an issue we're having that seems to be occurring after I stop and restart nats-server.

2024-06-11 08:01:04.934 +02:00 [DBG] Transport connection healthy!
2024-06-11 08:01:05.339 +02:00 [ERR] NATS server connection error. -->  NATS.Client.NATSConnectionException: Server closed the connection.
   at NATS.Client.Connection.readLoop()
2024-06-11 08:01:05.340 +02:00 [ERR] NATS server connection error. -->  NATS.Client.NATSConnectionException: Server closed the connection.
   at NATS.Client.Connection.readLoop()
2024-06-11 08:01:05.341 +02:00 [ERR] NATS server connection error. -->  NATS.Client.NATSConnectionException: Server closed the connection.
   at NATS.Client.Connection.readLoop()
2024-06-11 08:01:34.955 +02:00 [DBG] Transport connection checking...
2024-06-11 08:01:34.955 +02:00 [WRN] Transport connection is unhealthy, reconnection...

<!-- Here we receive a message from another client and will try to publish a message to  yet another client -->

2024-06-11 08:02:04.938 +02:00 [INF] Received package.transfer.completed
2024-06-11 08:02:05.484 +02:00 [ERR] An exception occurred while publishing a message, reason:Failed : . message id:10765409202733075
DotNetCore.CAP.Internal.PublisherSentFailedException: Connection is closed.
 ---> NATS.Client.NATSConnectionClosedException: Connection is closed.
   at NATS.Client.Connection.PublishImpl(String subject, String reply, MsgHeader inHeaders, Byte[] data, Int32 offset, Nullable`1 inCount, Boolean flushBuffer)
   at NATS.Client.Connection.RequestAsyncImpl(String subject, MsgHeader headers, Byte[] data, Int32 offset, Nullable`1 count, Int32 timeout, CancellationToken token)
   at NATS.Client.JetStream.JetStream.PublishAsyncInternal(String subject, Byte[] data, MsgHeader hdr, PublishOptions options)
   at DotNetCore.CAP.NATS.NATSTransport.SendAsync(TransportMessage message)

So I'm able to receive messages, but when I try to publish I get a error saying the connection is closed. Do you have any ideas?

Thanks in advance!

Originally posted by @smedjes in https://github.com/dotnetcore/CAP/issues/533#issuecomment-2159897063

yang-xiaodong commented 3 months ago

Hello @smedjes,

I tested it locally and couldn't reproduce your issue. However, I did find that it's possible to check the connection status when returning the connection to the pool. I will attempt a fix, but I'm not sure if it will be effective for you.

smedjes commented 3 months ago

@yang-xiaodong I could test it if you could provide a prerelease! :)

yang-xiaodong commented 3 months ago

@yang-xiaodong I could test it if you could provide a prerelease!

Good! version 8.2.0-preview-234883029 with this patch is released to NuGet in a few minutes ago.

smedjes commented 3 months ago

It works, thanks @yang-xiaodong!

We were apparently using an older version of CAP (8.0.1), so I wanted to make sure it wasn't a bug that's already been fixed.

I upgraded to your prerelease and it was working. Then I downgraded to version 8.1.2 and there I'm still getting the error. Now I've upgraded again, and it's working again.

yang-xiaodong commented 2 months ago

Fixed in version 8.2.0.