Open SamirFarhat opened 5 years ago
Did you leave everything else as is and only changed replica count and image name? If so, please send us application resource id, region, and time when you issued the upgrade.
When i said downtime, i meant downtime for 30 seconds, which is something unexpected too. Yes, i have changed nothing except the image tag, or the replica count. Also noticed that the Gateway was taking long time during the deployment, like it was re-deploying.
ResourceID : /subscriptions/a7321ec3-3919-442b-8a85-3c8580527c41/resourcegroups/test-sfm/providers/Microsoft.ServiceFabricMesh/applications/helloapp
Region: eastasia
Time: between 7 and 8 pm utc+1 (last deployment at 7.43)
I checked the log, at UTC 18:26;25 the upgrade was completed as a rollout upgrade.
When you mentioned downtime, was the website returns notReachable, or it was just delayed?
I was demoing something to my customer. And during the deployment, we hit the browser refresh button. We receive the browser page that says, this site is not accessible or something like that. I can demo it. So it was not reachable.
I uploaded a video, please see the behavior. https://www.youtube.com/watch?v=nSkuuhl89ws
Hi Samir, it looks like we have a bug in how we allocate ports for Gateways. I'll fix it and deploy in the next few days.
Thanks!
I was testing exposing multiple ports via Gateway on this issue https://github.com/Azure/service-fabric-mesh-preview/issues/315 and notice the same problem
When we apply an upgrade to the gateway, the existing routes stop working for a while and requests are not completed.
What changes did you make to the gateway in this scenario? Are you just adding ports?
This issue was initially raised for application upgrades (which should work without a problem now) but there is some expected downtime when a gateway is changed.
I've noticed that any changes on services would slow down or break the gateway, scenarios like scaling the service or upgrading it like mentioned above.
Also, adding new routes to the gateway put down every service behind it until update is complete.
I understand all these events have an impact in the gateway routes, as the services might move around, so I would expect the gateway to be more reliable and hot reload the routing configuration.
These kind of events will be very common and in the worse scenario only the related service\route should be affected, On scenarios where application updates happens multiple times a day, it would be unacceptable a gateway that fails on every release.
I noticed this en every app update like scaling the replica count or changing the code package container image version. Like i have showed on tge video. I will retry and report back
I have rested. Scaling from 1 to 2 replicas for example causes a downtime of few minutes. This is unusable is real life
Hence the version "preview". It's a serious issue though.
We are focusing on getting this fixed.
Just an update, the fix for this is being worked on, but holidays are slowing things down a bit. We have testing in place now so hopefully we can turnaround something quickly in January. Just FYI the expectation is that a single replica would maintain availability even during upgrades (upgrading container image for example).
@mattrowmsft This issue is still open and I am still experiencing similar issues. Is there a fix coming (your comment mentions January).
Hi all, I deployed a simple Mesh application, with 1 service. There are two replicas. 1- I have changed the replica count on my Template and deployed Expected Behavior : No downtime What really happened : A downtime, my site was down
2- I have 2 replicas. I changed the image tag i'm using and deployed my template Expected behavior : No downtime What really happened : A downtime, my site was down
Is this expected ?
Thanks