Particular / ServiceControl

Backend for ServiceInsight and ServicePulse
https://docs.particular.net/servicecontrol/
Other
53 stars 47 forks source link

ServiceControl leaks connections with RabbitMQ transport #4226

Closed ramonsmits closed 2 months ago

ramonsmits commented 5 months ago

Describe the bug

Description

When connectivity to the RabbitMQ broker is interrupted like a RabbitMQ cluster to temporarily not being available because there is no connectivity possible or the cluster is in a minority and does not accept messages it will start to leak connections.

Due to connections not being released the broker can reach its connection limit and reject any new connections causing outage of those services.

Expected behavior

No connections should be leaking when connectivity to the broker is restored and resources should be released.

Actual behavior

Connections are not released and require a restart of ServiceControl to release.

Below is for version 5.2.4

Before: Before connectivity is lost

After:

After connectivity has restored

Versions

Please list the version of the relevant packages or applications in which the bug exists.

Steps to reproduce

  1. Install SC 5.2.4 primary/audit with RabbitMQ
  2. Stop the broker
  3. Wait a couple of minutes (likely passed the 2 minute period to raise a critical error)
  4. Start the broker
  5. Observe the connections being restored
  6. Wait a while (2 minutes max)
  7. Observe slowly more connections being restored

Relevant log output

No response

Additional Information

Workarounds

Restarting ServiceControl will release all connections

Possible solutions

Additional information

ramonsmits commented 5 months ago

Transporinfrastructure is currently re-created without shutting down existing instances in 5.2.4 at:

AuditIngestion:

https://github.com/Particular/ServiceControl/blob/93bf0e844ef541dd3077a70bc2e081c15c88296d/src/ServiceControl.Audit/Auditing/AuditIngestion.cs#L111-L117

ErrorIngestion:

https://github.com/Particular/ServiceControl/blob/93bf0e844ef541dd3077a70bc2e081c15c88296d/src/ServiceControl/Operations/ErrorIngestion.cs#L145-L151