Leaving this here as both documentation and for anyone else who might have run into this issue and might stumble on to this.
In particular this has plagued gccollab for a while, in the end it was all down to SNAT port exhaustion because of the default outbound load balancer rules that assumes you're going to need enough ports for around 50 nodes minimum and connections will need to be allowed to be left idle for a while. This is the part of the docs that goes into detail on the defaults and what can be changed: https://learn.microsoft.com/en-us/azure/aks/load-balancer-standard#configure-the-allocated-outbound-ports.
Leaving this here as both documentation and for anyone else who might have run into this issue and might stumble on to this. In particular this has plagued gccollab for a while, in the end it was all down to SNAT port exhaustion because of the default outbound load balancer rules that assumes you're going to need enough ports for around 50 nodes minimum and connections will need to be allowed to be left idle for a while. This is the part of the docs that goes into detail on the defaults and what can be changed: https://learn.microsoft.com/en-us/azure/aks/load-balancer-standard#configure-the-allocated-outbound-ports.