Open ArunBee84 opened 1 year ago
I'm not entirely sure, let's ask @bradygaster and @ReubenBond for their thoughts on this.
Seems to have been originally reported here: https://github.com/Azure-Samples/Orleans-Cluster-on-Azure-App-Service/issues/3
Hi @ArunBee84 - thank you for posting this issue, I see that you also posted it on the Azure-Sample repo as well. Sorry for not seeing this sooner. In talking this over with the team, they're suggesting that you try configuring with two different silo names. That way they aren't trying to dial up the same cluster. Does that make sense?
I would guess that your staging slot and production slot have the same ClusterId set, so they are trying to form a single cluster... but they cannot because they have no network connectivity between them.
You need to also set vnet integration in staging slot too, only create deployment slot and copy app configuration did not set vnet integration in the staging slot, I've made an example project that both production slot and staging slot have two instances running simultaneously, The Bicep code is here.
You need to also set vnet integration in staging slot too, only create deployment slot and copy app configuration did not set vnet integration in the staging slot, I've made an example project that both production slot and staging slot have two instances running simultaneously, The Bicep code is here.
Hi @windperson,
Sorry I missed to mentioned before but I did verified the staging silo is also connected to the same subnet as the production silo. Here is the screenshot from Azure Storage Table showing the Silo Instances created.
The Silo with status joining is the Staging Instance. As you can see the IP Address is the same as the Production Silo but the status remains in Joining
Hi @ArunBee84 - thank you for posting this issue, I see that you also posted it on the Azure-Sample repo as well. Sorry for not seeing this sooner. In talking this over with the team, they're suggesting that you try configuring with two different silo names. That way they aren't trying to dial up the same cluster. Does that make sense?
Colleague of @ArunBee84 here
The issue with that would be that we have a servicebus on the staging slot that connects to the same servicebus as the production slot, and would therefore activate grains, but it would be spawning new grains and not continue from the same state, since the grain will be in a new cluster. This can be circumvented by ensuring that the staging slot doesn't connect to our servicebus, hence not activating any grains, but we would like to figure out why the staging silo have no connectivity to the production cluster as @ReubenBond also mentions.
Do we know if this is expected behavior?
I would guess that your staging slot and production slot have the same ClusterId set, so they are trying to form a single cluster... but they cannot because they have no network connectivity between them.
Hi @ReubenBond, Both Production and Staging Slot are connected to the same Vnet/subnet with no NSG rules applied to them.
Hi Guys,
Any update on this issue.
Adding @btardif to this discussion to get additional eyes on this issue from the App Service team side. I'm going to deploy a slotted instance of this app to see if I can emulate this environmental setup and replicate the issue. @windperson - your Bicep code - does that represent the entire topology deployed in a slotted instance? If not, I think I'll update this sample to reflect just that, so I'd appreciate any pull requests if you have that Bicep available.
Hi @bradygaster The Bicep sample is from my Chine article that should be re-authored as a printed book in next few months. That will produce a lab resource group that has Azure resource as following picture: https://github.com/windperson/2022ithome_30days/blob/main/articles/day37/OrleansUrlShortener.svg Azure App service contains a production slot and a staging slot, each has two instances running, you can take that as part of official sample if you like 😄👌
In my fork of this repo, I've created a slots
branch in which I create a slotted version of the site. I've tweaked the code so that this is the setup:
default
subnet and a staging
subnetI know this is somewhat variant from the topology @ArunBee84 proposed, but wanted to see if this would mitigate some of the issues folks have run into in this thread and from the original issue. cc @btardif to see if he has any recommendations on this front, and @IEvangelist as I think it'd be good to create a PR to the main fork of this sample, but that would require some doc updates, so I'd prefer to coordinate those together.
Issue descriptions Using the project Orleans/ShoppingCart, I was able to build and deploy successfully on Azure Web App. Also, to build a new Environment on Azure I used the bicep templates mentioned here. Everything worked fine till this point. Then I created a Staging Slot copying the production slot configuration and deployed the same ShoppingCart build. The Staging Slot Silo is stuck in the Joining state giving the below error.
If I open Kudu on Staging Slot, and try to ping the Production Silo address, I get the following exception:
Anyone got any ideas as to what could be wrong?