JasperFx / wolverine

Supercharged .NET server side development!
https://wolverinefx.net
MIT License
1.17k stars 126 forks source link

Connections to rabbit queues not re-created after rabbit restart #937

Open misha-p opened 1 week ago

misha-p commented 1 week ago

Wolverine service listening to rabbit queue permanently loosing connection channel after rabbit restart.

To Reproduce Steps to reproduce the behavior:

  1. Run Pinger and Ponger projects in Samples/PingPongWithRabbitMq
  2. In RMQ mgmt see both queues pings and pongs have 1 consumer/channel each
  3. Restart rabbitmq container
  4. In RMQ mgmt see queues pings and pongs lost consumers, not restored. Pinger keeps publishing, but messages not consumed.

Expected behavior After rabbitmq container fully restarts, both channels are created for queues pings and pongs.

Additional context Latest codebase 2.13.0 When I run with empty RMQ from wolverine docker compose, everything works as expected. The above happens when RabbitMQ server contains many queues used by bunch of other (NServiceBus) services.

jeremydmiller commented 1 week ago

Michael, I've tested this pretty extensively, and it's been able to reconnect at least locally.

jeremydmiller commented 1 week ago

Worst case scenario, I'll put some kind of "watcher" on that kickstarts the listener

misha-p commented 1 week ago

Thanks Jeremy - I understand, this one is hard to reproduce. As I mentioned it all works great with "empty" rabbit server (such as one in docker compose in Wolverine test harness). We see this issue when Rabbit server is in the large infrastructure, with 100+ queues and exchanges. And I see nothing suspicious in Rabbit logs at restart time. (BTW, all NServBus services reconnect fine,, but Wolverine ones aren't, or "sometimes" or partially - e.g. one of two listeners gets channels) I too will look at it more, maybe it really is a matter of making more aggressive listener startup

jeremydmiller commented 1 week ago

Any issues in the logs? Wolverine would be logging interruptions from the Rabbit MQ client

jeremydmiller commented 1 week ago

Is it only the listeners that are the problem, or does it flip out trying to send too?

jeremydmiller commented 1 week ago

Sorry @misha-p, one more thing, there's a Rabbit MQ client V7 coming soon that's async all the way down. Not sure about the timing of that, we've got a PR to add that to Wolverine though.

Also, any log messages like:

"Unexpected Rabbit MQ connection shutdown"

or

"Rabbit MQ connection is blocked because of {Reason}"

or

"Rabbit MQ connection error on callback"

misha-p commented 1 week ago

In logs I see Unexpected channel shutdown for Rabbit MQ. Wolverine will attempt to restart... when rabbit server is down. Also I don't think the issue is with senders, only with listeners,

And btw - I experimented a little with channel agent and surprisingly reconnect got fixed with this line:

image

I figured that in case of a rabbit with tons of queues, callback teardownChannel() doesn't happen at the right time and the agent state stays Connected, so I forced it. I don't think it's a proper fix though, but maybe it will help you to reason on our case. Anyways, with that change channels for listeners now get created after Rabbit server is back up

misha-p commented 1 week ago

did more tests with the above - still not stable, fully reconnects only sometimes (((

jeremydmiller commented 1 week ago

@misha-p & I spoke today. Problem seems to be only on listeners, and not senders (we think). I'm going to make a change where a Rabbit MQ listener immediately tries to send a Ping message when it starts up to see if that helps