Azure / aks-engine

AKS Engine: legacy tool for Kubernetes on Azure (see status)
https://github.com/Azure/aks-engine
MIT License
1.03k stars 522 forks source link

Unable to Scale masters for AKS-Engine centraleuap #3484

Closed amankohli closed 4 years ago

amankohli commented 4 years ago

We currently have only 1 fault domain in centraleuap which is forces us to deploy only one master at the time we spin up the aks-engine cluster as mentioned below in the git issue:

https://github.com/Azure/aks-engine/issues/2285

We need a way or workaround to scale the master as the current setup doesn't allow us to scale master nodes with the below command:

aks-engine scale

welcome[bot] commented 4 years ago

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

mboersma commented 4 years ago

I tried to provision in centraluseuap using AKS Engine master and didn't see any errors for a 3-master cluster. It's possible we've fixed something that affects this recently, or maybe my cluster template differs significantly from yours:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes"
    },
    "masterProfile": {
      "count": 3,
      "dnsPrefix": "",
      "vmSize": "Standard_D2_v3",
      "platformUpdateDomainCount": 1
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool1",
        "count": 2,
        "vmSize": "Standard_D2_v3"
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": ""
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "",
      "secret": ""
    }
  }
}
% ./bin/aks-engine deploy --debug --dns-prefix canary-multi-master \
    -f -m kubernetes-multi-master.json -l centraluseuap \
    --client-id=${AZURE_CLIENT_ID} --client-secret=${AZURE_CLIENT_SECRET} \
    --set linuxProfile.ssh.publicKeys\[0\].keyData="${AKSE_PUB_KEY}" \
    --set orchestratorProfile.orchestratorRelease=1.18
mboersma commented 4 years ago

@amankohli can you share more details? Which version of AKS Engine, which version of Kubernetes, and what does your cluster template look like?

amankohli commented 4 years ago

@mboersma
We are running Below is the aksengine version: aks-engine version Version: v0.47.0 GitCommit: fc55351ca GitTreeState: clean

Kubernetes version: v1.13.11

Below is cluster template;

{ "apiVersion": "vlabs", "properties": { "orchestratorProfile": { "orchestratorType": "Kubernetes", "orchestratorRelease": "1.13", "kubernetesConfig": { "networkPlugin": "kubenet", "privateCluster": { "enabled": false } } }, "masterProfile": { "count": 1, "dnsPrefix": "k8scentraluseuapdc", "vmSize": "Standard_D2s_v3", "platformUpdateDomainCount": 1, "OSDiskSizeGB": 100, "vnetSubnetId": "/subscriptions/xx", "firstConsecutiveStaticIP": "10.xx.0.xx" }, "agentPoolProfiles": [ { "name": "node", "count": 9, "vmSize": "Standard_D8s_v3", "OSDiskSizeGB": 100, "vnetSubnetId": "/subscriptions/xx" } ], "linuxProfile": { "adminUsername": "ubuntu", "ssh": { "publicKeys": [ { "keyData": "xx' } ] } }, "servicePrincipalProfile": { "clientId": "xx", "secret": "xx" } } } Maybe we need to use aksengine with a higher version

mboersma commented 4 years ago

Thanks! I'll try to reproduce this with a similar cluster template and v0.47.0. Hopefully we can pin down the problem.

mboersma commented 4 years ago

I'm not able to reproduce this (yet). I used v0.47.0 and Kubernetes 1.13.11 with a nearly identical cluster template and custom VNET.

% ./bin/aks-engine deploy --debug --dns-prefix canary-multi-master \
    -f -m canary-multi-master.json -l centraluseuap \
    --client-id=${AZURE_CLIENT_ID} --client-secret=${AZURE_CLIENT_SECRET} \
    --set linuxProfile.ssh.publicKeys\[0\].keyData="${AKSE_PUB_KEY}" \
    --set orchestratorProfile.orchestratorRelease=1.13
...
INFO[0009] Starting ARM Deployment canary-multi-master-877969274 in resource group canary-multi-master. This will take some time... 
INFO[0203] Finished ARM Deployment (canary-multi-master-877969274). Succeeded 
% ./bin/aks-engine version
Version: v0.47.0
GitCommit: fc55351ca
GitTreeState: dirty
% export KUBECONFIG=_output/canary-multi-master/kubeconfig/kubeconfig.centraluseuap.json 
% kubectl get nodes 
NAME                           STATUS   ROLES    AGE   VERSION
k8s-master-36163078-0          Ready    master   70s   v1.13.11
k8s-master-36163078-1          Ready    master   67s   v1.13.11
k8s-master-36163078-2          Ready    master   75s   v1.13.11
k8s-node-36163078-vmss000000   Ready    agent    82s   v1.13.11

@amankohli could you share the specific "template invalid" error you see?

mboersma commented 4 years ago

@amankohli I'm sure there's a real bug here that I haven't managed to reproduce. Please let me know if you're still blocked by this and if you have any more context to provide.

amankohli commented 4 years ago

@mboersma Sorry delay in the response as we were redeploying the cluster again and we were able to create the 3 masters . Thank you for looking into the issue.
The above issue is we are unable to scale master and this should be fixed too as it will be easier to just scale up master in some scenario rather than just redeploy the whole cluster.

jackfrancis commented 4 years ago

Is this an active issue?

mboersma commented 4 years ago

It sounds like a workaround has been found, and we haven't been able to reproduce anything that points to a fix needed, so I'm going to close this issue. Please reopen it if this is still ongoing.