Azure / aks-engine

AKS Engine: legacy tool for Kubernetes on Azure (see status)
https://github.com/Azure/aks-engine
MIT License
1.03k stars 522 forks source link

deployment fail when specifying customSearchDomain #1577

Closed mohatb closed 5 years ago

mohatb commented 5 years ago

Describe the bug

When using customSearchDomain the deployment fails with the below error:

Deployment failed. Correlation ID: edf9f57f-6512-4240-9b89-6d60336e325a. {
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "VMExtensionProvisioningError",
        "message": "VM has reported a failure when processing extension 'cse-master-0'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=80\n[stdout]\n\n[stderr]\nConnection to k8s.gcr.io 443 port [tcp/https] succeeded!\nConnection to gcr.io 443 port [tcp/https] succeeded!\nConnection to docker.io 443 port [tcp/https] succeeded!\n\"."
      }
    ]
  }
}

looking at code 80 and I see it related to setup setup-custom-search-domains.sh https://github.com/Azure/acs-engine/blob/master/parts/k8s/setup-custom-search-domains.sh

++ when ssh to the master node and looking at cluster-provision.log we see the below error at the end of the file.

sudo: unable to resolve host k8s-master-34763407-0

Steps To Reproduce

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
        "networkPolicy": "calico"
      }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "testabutalebdns",
      "vmSize": "Standard_B2s",
      "vnetSubnetId": "<subnetidwashere>",
      "firstConsecutiveStaticIP": "10.240.10.1",
      "vnetCidr": "10.240.0.0/16",
      "preProvisionExtension": {
        "name": "register-dns",
        "singleOrAll": "All"
      }
    },
    "agentPoolProfiles": [
      {
        "name": "staging",
        "count": 1,
        "vmSize": "Standard_B2s",
        "vnetSubnetId": "<subnetidwashere>",
        "availabilityProfile": "VirtualMachineScaleSets",
        "preProvisionExtension": {
          "name": "register-dns",
          "singleOrAll": "All"
        }
      },
      {
        "name": "production",
        "count": 1,
        "vmSize": "Standard_B2s",
        "vnetSubnetId": "<subnetidwashere>",
        "availabilityProfile": "VirtualMachineScaleSets",
        "preProvisionExtension": {
          "name": "register-dns",
          "singleOrAll": "All"
        }
      }
    ],
    "linuxProfile": {
      "customNodesDNS": {
        "dnsServer": "10.240.0.6"
      },
          "customSearchDomain": {
          "name": "porthit.com",
          "realmUser": "user1",
          "realmPassword": "password1"
        },
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": ""
          }
        ]
      }
    },
    "extensionProfiles": [
      {
        "name": "register-dns",
        "version": "v1",
        "extensionParameters": "porthit.com",
        "rootURL": "https://raw.githubusercontent.com/Azure/aks-engine/master/extensions/dnsupdate/v1/",
        "script": "register-dns.sh"
      }
    ],
    "servicePrincipalProfile": {
      "clientId": "<clientidwashere>",
      "secret": "<secretwashere>"
    }
}

Expected behavior

AKS Engine version

Version: v0.37.4 GitCommit: 9e8364ee9 GitTreeState: clean

Kubernetes version

latest provided by aks-engine

Additional context

welcome[bot] commented 5 years ago

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

mohatb commented 5 years ago

Hello Team - any update ?

UnwashedMeme commented 5 years ago

Issue 1:

When I SSH'd into the computer I found that the file still had the template placeholders in it, e.g.


$ cat /opt/azure/containers/setup-custom-search-domains.sh
--
#!/bin/bash
set -x
source /opt/azure/containers/provision_source.sh
echo "  dns-search <searchDomainName>" \| tee -a /etc/network/interfaces.d/50-cloud-init.cfg
systemctl_restart 20 5 10 restart networking
wait_for_apt_locks
retrycmd_if_failure 10 5 120 apt-get -y install realmd sssd sssd-tools samba-common samba samba-common python2.7 samba-libs packagekit
wait_for_apt_locks
echo "<searchDomainRealmPassword>" \| realm join -U <searchDomainRealmUser>@$(echo "<searchDomainName>" \| tr /a-z/ /A-Z/) $(echo "<searchDomainName>" \| tr /a-z/ /A-Z/)

digging in the code a bit this morning I saw a file name kubelet.sh that had a sed command that looked like it should fix this; perhaps kubelet.sh now doesn't happen until after this file is invoked?

Issue 2:

If you try to do this on a ubuntu 18.04 based image ("distro": "aks-ubuntu-18.04",) it won't work because the first line is to add to /etc/network/interfaces.d/50-cloud-init.cfg, which doesn't exist as ifupdown has been replaced with netplan

mohatb commented 5 years ago

@unwashedmeme check https://github.com/Azure/aks-engine/pull/1635

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.