Hybrid Cluster with Custom Vnet Doesn't seem to work with acs-engine v0.11 #1949

Closed jwalker343 closed 5 years ago

jwalker343 commented 6 years ago

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) kubernetes 1.7.9

What happened: I downloaded the latest release of acs-engine and created a hybrid cluster in a custom vnet with the template below. Windows Pods fail to start with an "Error Syncing Pod"

What you expected to happen: Windows Pods should run properly.

I have a vnet VN-Sandbox1-useast with 3 subnets: k8smaster = k8sagent = k8sclustersubnet =


  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
          "clusterSubnet": "",
          "networkPolicy": "none"
    "masterProfile": {
        "count": 3,
        "dnsPrefix": "sandbox1-useast",
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8smaster",
        "firstConsecutiveStaticIP": "",
        "vnetCidr": ""
    "agentPoolProfiles": [
        "name": "linuxpool1",
        "count": 2,
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8sagent",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Linux"
        "name": "winpool1",
        "count": 2,
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8sagent",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows"
    "windowsProfile": {
        "adminUsername": "k8sagentadmin",
        "adminPassword": "xxxx"
    "linuxProfile": {
        "adminUsername": "azureuser",
        "ssh": {
            "publicKeys": [
                    "keyData": "ssh-rsa xxxx"
    "servicePrincipalProfile": {
        "clientId": "xxxx",
        "secret": "xxxx"

Run ./acsengine generate template.json

Edit the azuredeploy.json file and add the subnet variable due to

Run az group deployment create --template-file "azuredeploy.json" --parameters "azuredeploy.parameters.json" -g RG-Sandbox1-useast -n VN-Sandbox1-useast

Update route tables: az network vnet subnet update -n k8smaster -g RG-Sandbox1-useast --vnet-name VN-sandbox1-useast --route-table <RT_NAME>

az network vnet subnet update -n k8sagent -g RG-Sandbox1-useast --vnet-name VN-Sandbox1-useast --route-table <RT_NAME>

Run a standard aspnet image:

apiVersion: apps/v1beta1
kind: Deployment
  name: scratch-dep
  replicas: 1
      name: scratch-dep
        app: asp
      restartPolicy: Always
      nodeSelector: windows
      - name: asp
        imagePullPolicy: Always
        image: microsoft/aspnet:4.7.1-windowsservercore-1709
        tty: true
        stdin: true
        - containerPort: 80

PS C:\k> cat .\kubelet.log
waiting to discover pod CIDR
Sleeping for 10s, and then waiting to discover pod CIDR

No HNS network found, creating a new one...
VERBOSE: Invoke-HNSRequest Method[POST] Path[/networks/] Data[{

    "Subnets":  [



                        "AddressPrefix":  ""



    "Name":  "l2bridge",

    "Type":  "L2Bridge"

VERBOSE: Result: { "Error" : "The parameter is incorrect. ", "Success" : false
Generated CNI Config [@{cniVersion=0.2.0; name=l2bridge; type=wincni.exe; master=Ethernet; capabilities=; ipam=; dns=; AdditionalArgs=System.Object[]}]
PS C:\k>


{"level":"debug","msg":"[cni-net] Processing ADD command with args {ContainerID:c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Netns:none IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=scratch-dep-1585851119-h6jp0;K8S_POD_INFRA_CONTAINER_ID=c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Path:/opt/wincni.exe/bin;c:\\k\\cni}.","time":"2017-12-19T16:41:47Z"}
{"level":"error","msg":"[cni-net] Failed to parse network configuration, err:invalid IP address: /subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.1.","time":"2017-12-19T16:41:47Z"}
{"level":"debug","msg":"[cni-net] Plugin stopped.","time":"2017-12-19T16:41:47Z"}
E1219 16:41:47.936922     852 cni.go:238] Error adding network: unexpected end of JSON input
E1219 16:41:47.936922     852 cni.go:206] Error while adding to cni network: unexpected end of JSON input
E1219 16:41:49.457152     852 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 kuberuntime_manager.go:624] createPodSandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 pod_workers.go:182] Error syncing pod 7b60a790-e4db-11e7-b16b-000d3a14ca19 ("scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)"), skipping: failed to "CreatePodSandbox" for "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" with CreatePodSandboxError: "CreatePodSandbox for pod \"scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"scratch-dep-1585851119-h6jp0_default\" network: unexpected end of JSON input"
I1219 16:41:49.870640     852 kubelet.go:1917] SyncLoop (PLEG): "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)", event: &pleg.PodLifecycleEvent{ID:"7b60a790-e4db-11e7-b16b-000d3a14ca19", Type:"ContainerDied", Data:"c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d"}
W1219 16:41:49.870640     852 pod_container_deletor.go:77] Container "c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d" not found in pod's containers
I1219 16:41:50.268678     852 kuberuntime_manager.go:389] No ready sandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" can be found. Need to start a new one
I1219 16:41:50.268678     852 kuberuntime_manager.go:463] Container {Name:asp Image:microsoft/aspnet:4.7.1-windowsservercore-1709 Command:[] Args:[] WorkingDir: Ports:[{Name: HostPort:0 ContainerPort:80 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-14lt1 ReadOnly:true MountPath:/var/run/secrets/ SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:nil Stdin:true StdinOnce:false TTY:true} is dead, but RestartPolicy says that we should restart it.
{"level":"debug","msg":"[cni-net] Plugin wcn-net version .","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:3 MTU:1500 Name:Ethernet 3 HardwareAddr:00:0d:3a:14:ce:87 Flags:up|broadcast|multicast} with IP addresses: [fe80::1ce9:58b:36af:8180/64]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:11 MTU:1500 Name:vEthernet (nat) HardwareAddr:00:15:5d:ce:fd:51 Flags:up|broadcast|multicast} with IP addresses: [fe80::58c1:329:b498:9a1a/64]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:1 MTU:-1 Name:Loopback Pseudo-Interface 1 HardwareAddr: Flags:up|loopback|multicast} with IP addresses: [::1/128]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[cni-net] Plugin started.","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[cni-net] Processing DEL command with args {ContainerID:c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Netns:none IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=scratch-dep-1585851119-h6jp0;K8S_POD_INFRA_CONTAINER_ID=c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Path:/opt/wincni.exe/bin;c:\\k\\cni}.","time":"2017-12-19T16:41:50Z"}
{"level":"error","msg":"[cni-net] Failed to parse network configuration, err:invalid IP address: /subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.1.","time":"2017-12-19T16:41:50Z"}

I've tried this also with "networkPolicy": "azure" and I get the same result. Please let me know if you need any more information.

jwalker343 commented 6 years ago

Linux Containers run fine:

  Normal  Scheduled              23s   default-scheduler                   Successfully assigned scratch-dep-1136686056-xjpdv to k8s-linuxpool1-40233731-0
  Normal  SuccessfulMountVolume  23s   kubelet, k8s-linuxpool1-40233731-0  MountVolume.SetUp succeeded for volume "default-token-14lt1"
  Normal  Pulling                22s   kubelet, k8s-linuxpool1-40233731-0  pulling image "nginx"
  Normal  Pulled                 18s   kubelet, k8s-linuxpool1-40233731-0  Successfully pulled image "nginx"
  Normal  Created                17s   kubelet, k8s-linuxpool1-40233731-0  Created container
  Normal  Started                17s   kubelet, k8s-linuxpool1-40233731-0  Started container
jwalker343 commented 6 years ago

I regressed and used acs-engine version v0.9.1 and I'm at least able to run pods

./acs-engine version
Version: v0.9.1
GitCommit: f9d0e574
GitTreeState: clean
chweidling commented 6 years ago

I tested the custom VNET for Windows nodes with the snapshot 12d7fc5a8143866f34c3c6a21b003eb96960b68b from 2018-01-09. I patched the generated azuredeploy.json file, so that it contains a variable subnet with the value of the subnet range of my custom VNET like this:

"subnet": ""

The deployment worked and the Windows nodes were successfully created. I could deploy Windows containers. But inside the Windows pods, there was a problem with DNS: I could not resolve domain names, that is, I could not reach services inside my cluster.

jwalker343 commented 6 years ago

@chweidling I have not rebuilt a cluster using the latest snapshot, however according to you may just have to wait a little while and DNS may resolve itself?

chweidling commented 6 years ago

The problem does not disapper even after one hour waiting.

pushkar-bitwise commented 6 years ago

Hi @jwalker343 , we are facing similar issue, are you able to find root cause or solution for same.

SachinL9 commented 5 years ago

I am facing problems deploying a hybrid cluster in custom vnet.

Error: The template parameter 'masterSubnet' is not found. acs-engine version: v0.21.2 k8s version: 1.11

Any idea when support for "Hybrid Cluster with Custom Vnet" will be added?

jwalker343 commented 5 years ago

I was able to successfully deploy a Hybrid cluster in a custom vnet with 0.25.3. This is an old issue and is almost 1year old, so I'm marking it closed.