Closed amankohli closed 4 years ago
👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.
I tried to provision in centraluseuap using AKS Engine master and didn't see any errors for a 3-master cluster. It's possible we've fixed something that affects this recently, or maybe my cluster template differs significantly from yours:
{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes"
},
"masterProfile": {
"count": 3,
"dnsPrefix": "",
"vmSize": "Standard_D2_v3",
"platformUpdateDomainCount": 1
},
"agentPoolProfiles": [
{
"name": "agentpool1",
"count": 2,
"vmSize": "Standard_D2_v3"
}
],
"linuxProfile": {
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
{
"keyData": ""
}
]
}
},
"servicePrincipalProfile": {
"clientId": "",
"secret": ""
}
}
}
% ./bin/aks-engine deploy --debug --dns-prefix canary-multi-master \
-f -m kubernetes-multi-master.json -l centraluseuap \
--client-id=${AZURE_CLIENT_ID} --client-secret=${AZURE_CLIENT_SECRET} \
--set linuxProfile.ssh.publicKeys\[0\].keyData="${AKSE_PUB_KEY}" \
--set orchestratorProfile.orchestratorRelease=1.18
@amankohli can you share more details? Which version of AKS Engine, which version of Kubernetes, and what does your cluster template look like?
@mboersma
We are running
Below is the aksengine version:
aks-engine version
Version: v0.47.0
GitCommit: fc55351ca
GitTreeState: clean
Kubernetes version: v1.13.11
Below is cluster template;
{ "apiVersion": "vlabs", "properties": { "orchestratorProfile": { "orchestratorType": "Kubernetes", "orchestratorRelease": "1.13", "kubernetesConfig": { "networkPlugin": "kubenet", "privateCluster": { "enabled": false } } }, "masterProfile": { "count": 1, "dnsPrefix": "k8scentraluseuapdc", "vmSize": "Standard_D2s_v3", "platformUpdateDomainCount": 1, "OSDiskSizeGB": 100, "vnetSubnetId": "/subscriptions/xx", "firstConsecutiveStaticIP": "10.xx.0.xx" }, "agentPoolProfiles": [ { "name": "node", "count": 9, "vmSize": "Standard_D8s_v3", "OSDiskSizeGB": 100, "vnetSubnetId": "/subscriptions/xx" } ], "linuxProfile": { "adminUsername": "ubuntu", "ssh": { "publicKeys": [ { "keyData": "xx' } ] } }, "servicePrincipalProfile": { "clientId": "xx", "secret": "xx" } } } Maybe we need to use aksengine with a higher version
Thanks! I'll try to reproduce this with a similar cluster template and v0.47.0. Hopefully we can pin down the problem.
I'm not able to reproduce this (yet). I used v0.47.0 and Kubernetes 1.13.11 with a nearly identical cluster template and custom VNET.
% ./bin/aks-engine deploy --debug --dns-prefix canary-multi-master \
-f -m canary-multi-master.json -l centraluseuap \
--client-id=${AZURE_CLIENT_ID} --client-secret=${AZURE_CLIENT_SECRET} \
--set linuxProfile.ssh.publicKeys\[0\].keyData="${AKSE_PUB_KEY}" \
--set orchestratorProfile.orchestratorRelease=1.13
...
INFO[0009] Starting ARM Deployment canary-multi-master-877969274 in resource group canary-multi-master. This will take some time...
INFO[0203] Finished ARM Deployment (canary-multi-master-877969274). Succeeded
% ./bin/aks-engine version
Version: v0.47.0
GitCommit: fc55351ca
GitTreeState: dirty
% export KUBECONFIG=_output/canary-multi-master/kubeconfig/kubeconfig.centraluseuap.json
% kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-36163078-0 Ready master 70s v1.13.11
k8s-master-36163078-1 Ready master 67s v1.13.11
k8s-master-36163078-2 Ready master 75s v1.13.11
k8s-node-36163078-vmss000000 Ready agent 82s v1.13.11
@amankohli could you share the specific "template invalid" error you see?
@amankohli I'm sure there's a real bug here that I haven't managed to reproduce. Please let me know if you're still blocked by this and if you have any more context to provide.
@mboersma Sorry delay in the response as we were redeploying the cluster again and we were able to create the 3 masters . Thank you for looking into the issue.
The above issue is we are unable to scale master and this should be fixed too as it will be easier to just scale up master in some scenario rather than just redeploy the whole cluster.
Is this an active issue?
It sounds like a workaround has been found, and we haven't been able to reproduce anything that points to a fix needed, so I'm going to close this issue. Please reopen it if this is still ongoing.
We currently have only 1 fault domain in centraleuap which is forces us to deploy only one master at the time we spin up the aks-engine cluster as mentioned below in the git issue:
https://github.com/Azure/aks-engine/issues/2285
We need a way or workaround to scale the master as the current setup doesn't allow us to scale master nodes with the below command:
aks-engine scale