Azure / acs-engine

WE HAVE MOVED: Please join us at Azure/aks-engine!
https://github.com/Azure/aks-engine
MIT License
1.03k stars 560 forks source link

aks-docker-engine distro with v0.25.2 deployments fails on Azure US Gov #4261

Closed amrmahdi closed 5 years ago

amrmahdi commented 5 years ago

Is this a request for help?:

No

Is this an ISSUE or FEATURE REQUEST? (choose one):

Issue

What version of acs-engine?:

0.25.2

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

1.10.18

What happened:

Deploying the following template fails on Azure US Gov

{
  "apiVersion": "vlabs", 
  "location": "usgovvirginia", 
  "properties": {
    "agentPoolProfiles": [
      {
        "OSDiskSizeGB": 0, 
        "availabilityProfile": "VirtualMachineScaleSets", 
        "count": 1, 
        "distro": "aks-docker-engine", 
        "ipAddressCount": 128, 
        "kubernetesConfig": {
          "controllerManagerConfig": {
            "--terminated-pod-gc-threshold": "50"
          }, 
          "kubeletConfig": {
            "--image-pull-progress-deadline": "30m", 
            "--max-pods": "110"
          }
        }, 
        "name": "agentpool1", 
        "storageProfile": "ManagedDisks", 
        "vmSize": "Standard_DS13_v2"
      }
    ], 
    "linuxProfile": {
      "adminUsername": "user", 
      "ssh": {
        "publicKeys": [
          {
            "keyData": "********"
          }
        ]
      }
    }, 
    "masterProfile": {
      "OSDiskSizeGB": 0, 
      "count": 1, 
      "distro": "aks-docker-engine", 
      "dnsPrefix": "test1", 
      "vmSize": "Standard_D8_v3"
    }, 
    "orchestratorProfile": {
      "kubernetesConfig": {
        "addons": [
          {
            "enabled": false, 
            "name": "blobfuse-flexvolume"
          }, 
          {
            "enabled": false, 
            "name": "smb-flexvolume"
          }, 
          {
            "enabled": false, 
            "name": "keyvault-flexvolume"
          }, 
          {
            "enabled": true, 
            "name": "nvidia-device-plugin"
          }
        ], 
        "apiServerConfig": {
          "--enable-admission-plugins": "Priority,Initializers", 
          "--feature-gates": "PodPriority=true", 
          "--runtime-config": "admissionregistration.k8s.io/v1alpha1,scheduling.k8s.io/v1alpha1=true"
        }, 
        "etcdDiskSizeGB": "2048", 
        "kubeletConfig": {
          "--feature-gates": "PodPriority=true"
        }, 
        "schedulerConfig": {
          "--feature-gates": "PodPriority=true"
        }
      }, 
      "orchestratorType": "Kubernetes", 
      "orchestratorVersion": "1.10.8"
    }, 
    "servicePrincipalProfile": {
      "clientId": "**************************", 
      "secret": "********"
    }
  }
}

We get the following error:

ERROR: Deployment failed. Correlation ID: fd54cb6b-7551-47e4-9be8-dafcc177273c. {
  "error": {
    "code": "InvalidParameter",
    "message": "The value of parameter imageReference.publisher is invalid.",
    "target": "imageReference.publisher"
  }
}

What you expected to happen: Deployment to succeed.

How to reproduce it (as minimally and precisely as possible): Use the the above template to deploy to Azure US Gov cloud.

Anything else we need to know:

jackfrancis commented 5 years ago

@CecileRobertMichon do we need to ensure that the "aks-docker-engine" distro gets the same treatment as "aks" got here:

https://github.com/Azure/acs-engine/issues/3873

CecileRobertMichon commented 5 years ago

No, this is simply because the aks distros are not supported in sovereign clouds (yet). You will need to specify "ubuntu" (or leave it blank and acs-engine will choose ubuntu by default). See https://github.com/Azure/aks-engine/blob/master/docs/clusterdefinition.md#L556

jackfrancis commented 5 years ago

I think this user got the aks-docker-engine distro because of N series VM SKU. Should we add a higher order of predence setting that sets distro to "ubuntu" for non-public cloud?

CecileRobertMichon commented 5 years ago

It looks like @amrmahdi explicitly specified "aks-docker-engine" in their apimodel (please correct me if I'm wrong). We already default to Ubuntu for non-public cloud (and only override to aks-docker-engine for N series if "aks" was specified).

amrmahdi commented 5 years ago

Yes and that is because of another issue. https://github.com/Azure/acs-engine/issues/4241#issuecomment-438343260

And yes we need it for gpu pools too.

CecileRobertMichon commented 5 years ago

@amrmahdi I see. In that case this is a regression as we effectively don't support docker-engine without VHDs (which aren't available on sovereign clouds yet) since we switched to Moby. We need to re-add a path to install docker-engine with the ubuntu distro. Sorry for the inconvenience. In the meantime you should use an acs-engine version prior to 0.25.0.

amrmahdi commented 5 years ago

@CecileRobertMichon we upgraded to 0.25.0 to resolve nvidia resiliency issue in 0.24.2 :) Anyways we are using 0.24.2 until 0.25+ is stable.