Azure / acs-engine

WE HAVE MOVED: Please join us at Azure/aks-engine!
https://github.com/Azure/aks-engine
MIT License
1.03k stars 561 forks source link

Hybrid Cluster with Custom Vnet Doesn't seem to work with acs-engine v0.11 #1949

Closed jwalker343 closed 5 years ago

jwalker343 commented 6 years ago

Is this a request for help?:

YES


Is this an ISSUE or FEATURE REQUEST? (choose one):

ISSUE


What version of acs-engine?:

0.11.0


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) kubernetes 1.7.9

What happened: I downloaded the latest release of acs-engine and created a hybrid cluster in a custom vnet with the template below. Windows Pods fail to start with an "Error Syncing Pod"

What you expected to happen: Windows Pods should run properly.

How to reproduce it (as minimally and precisely as possible):

I have a vnet VN-Sandbox1-useast with 3 subnets: k8smaster = 10.201.150.0/26 k8sagent = 10.201.155.0/26 k8sclustersubnet = 10.201.240.0/21

template.json:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
          "clusterSubnet": "10.201.240.0/21",
          "networkPolicy": "none"
      }
    },
    "masterProfile": {
        "count": 3,
        "dnsPrefix": "sandbox1-useast",
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8smaster",
        "firstConsecutiveStaticIP": "10.201.150.10",
        "vnetCidr": "10.201.0.0/16"
    },
    "agentPoolProfiles": [
    {
        "name": "linuxpool1",
        "count": 2,
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8sagent",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Linux"
    },
    {
        "name": "winpool1",
        "count": 2,
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/xxxx/resourceGroups/RG-sandbox1-USEast/providers/Microsoft.Network/virtualNetworks/VN-sandbox1-USEast/subnets/k8sagent",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows"
    }
    ],
    "windowsProfile": {
        "adminUsername": "k8sagentadmin",
        "adminPassword": "xxxx"
    },
    "linuxProfile": {
        "adminUsername": "azureuser",
        "ssh": {
            "publicKeys": [
                {
                    "keyData": "ssh-rsa xxxx email@email.com"
                }
            ]
        }
    },
    "servicePrincipalProfile": {
        "clientId": "xxxx",
        "secret": "xxxx"
    }
}
}

Run ./acsengine generate template.json

Edit the azuredeploy.json file and add the subnet variable due to https://github.com/Azure/acs-engine/issues/1767#issuecomment-345283959

Run az group deployment create --template-file "azuredeploy.json" --parameters "azuredeploy.parameters.json" -g RG-Sandbox1-useast -n VN-Sandbox1-useast

Update route tables: az network vnet subnet update -n k8smaster -g RG-Sandbox1-useast --vnet-name VN-sandbox1-useast --route-table <RT_NAME>

az network vnet subnet update -n k8sagent -g RG-Sandbox1-useast --vnet-name VN-Sandbox1-useast --route-table <RT_NAME>

Run a standard aspnet image:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: scratch-dep
spec:
  replicas: 1
  template:
    metadata:
      name: scratch-dep
      labels:
        app: asp
    spec:
      restartPolicy: Always
      nodeSelector:
        beta.kubernetes.io/os: windows
      containers:
      - name: asp
        imagePullPolicy: Always
        image: microsoft/aspnet:4.7.1-windowsservercore-1709
        tty: true
        stdin: true
        ports:
        - containerPort: 80

Anything else we need to know:

C:\k\kubelet.log

PS C:\k> cat .\kubelet.log
waiting to discover pod CIDR
Sleeping for 10s, and then waiting to discover pod CIDR
Ok.

No HNS network found, creating a new one...
VERBOSE: Invoke-HNSRequest Method[POST] Path[/networks/] Data[{

    "Subnets":  [

                    {

                        "GatewayAddress":
"/subscriptions/xxxx/resourceGroups/RG-Sandbox1
-USEast/providers/Microsoft.1",

                        "AddressPrefix":  "10.201.244.0/24"

                    }

                ],

    "Name":  "l2bridge",

    "Type":  "L2Bridge"

}]
VERBOSE: Result: { "Error" : "The parameter is incorrect. ", "Success" : false
}
Generated CNI Config [@{cniVersion=0.2.0; name=l2bridge; type=wincni.exe; master=Ethernet; capabilities=; ipam=; dns=; AdditionalArgs=System.Object[]}]
PS C:\k>

kubelet.err.log

~~~
{"level":"debug","msg":"[cni-net] Processing ADD command with args {ContainerID:c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Netns:none IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=scratch-dep-1585851119-h6jp0;K8S_POD_INFRA_CONTAINER_ID=c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Path:/opt/wincni.exe/bin;c:\\k\\cni}.","time":"2017-12-19T16:41:47Z"}
{"level":"error","msg":"[cni-net] Failed to parse network configuration, err:invalid IP address: /subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.1.","time":"2017-12-19T16:41:47Z"}
{"level":"debug","msg":"[cni-net] Plugin stopped.","time":"2017-12-19T16:41:47Z"}
E1219 16:41:47.936922     852 cni.go:238] Error adding network: unexpected end of JSON input
E1219 16:41:47.936922     852 cni.go:206] Error while adding to cni network: unexpected end of JSON input
E1219 16:41:49.457152     852 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 kuberuntime_manager.go:624] createPodSandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "scratch-dep-1585851119-h6jp0_default" network: unexpected end of JSON input
E1219 16:41:49.457152     852 pod_workers.go:182] Error syncing pod 7b60a790-e4db-11e7-b16b-000d3a14ca19 ("scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)"), skipping: failed to "CreatePodSandbox" for "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" with CreatePodSandboxError: "CreatePodSandbox for pod \"scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"scratch-dep-1585851119-h6jp0_default\" network: unexpected end of JSON input"
I1219 16:41:49.870640     852 kubelet.go:1917] SyncLoop (PLEG): "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)", event: &pleg.PodLifecycleEvent{ID:"7b60a790-e4db-11e7-b16b-000d3a14ca19", Type:"ContainerDied", Data:"c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d"}
W1219 16:41:49.870640     852 pod_container_deletor.go:77] Container "c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d" not found in pod's containers
I1219 16:41:50.268678     852 kuberuntime_manager.go:389] No ready sandbox for pod "scratch-dep-1585851119-h6jp0_default(7b60a790-e4db-11e7-b16b-000d3a14ca19)" can be found. Need to start a new one
I1219 16:41:50.268678     852 kuberuntime_manager.go:463] Container {Name:asp Image:microsoft/aspnet:4.7.1-windowsservercore-1709 Command:[] Args:[] WorkingDir: Ports:[{Name: HostPort:0 ContainerPort:80 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-14lt1 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:nil Stdin:true StdinOnce:false TTY:true} is dead, but RestartPolicy says that we should restart it.
{"level":"debug","msg":"[cni-net] Plugin wcn-net version .","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:3 MTU:1500 Name:Ethernet 3 HardwareAddr:00:0d:3a:14:ce:87 Flags:up|broadcast|multicast} with IP addresses: [fe80::1ce9:58b:36af:8180/64 10.201.155.5/26]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:11 MTU:1500 Name:vEthernet (nat) HardwareAddr:00:15:5d:ce:fd:51 Flags:up|broadcast|multicast} with IP addresses: [fe80::58c1:329:b498:9a1a/64 172.31.48.1/20]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[net] Network interface: {Index:1 MTU:-1 Name:Loopback Pseudo-Interface 1 HardwareAddr: Flags:up|loopback|multicast} with IP addresses: [::1/128 127.0.0.1/8]","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[cni-net] Plugin started.","time":"2017-12-19T16:41:50Z"}
{"level":"debug","msg":"[cni-net] Processing DEL command with args {ContainerID:c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Netns:none IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=scratch-dep-1585851119-h6jp0;K8S_POD_INFRA_CONTAINER_ID=c2eafaf460cd3623e7f1974df0aaea72617de9994efa8a199aefd493aa46dd0d Path:/opt/wincni.exe/bin;c:\\k\\cni}.","time":"2017-12-19T16:41:50Z"}
{"level":"error","msg":"[cni-net] Failed to parse network configuration, err:invalid IP address: /subscriptions/xxxx/resourceGroups/RG-Sandbox1-USEast/providers/Microsoft.1.","time":"2017-12-19T16:41:50Z"}
~~~

I've tried this also with "networkPolicy": "azure" and I get the same result. Please let me know if you need any more information.

jwalker343 commented 6 years ago

Linux Containers run fine:

  Normal  Scheduled              23s   default-scheduler                   Successfully assigned scratch-dep-1136686056-xjpdv to k8s-linuxpool1-40233731-0
  Normal  SuccessfulMountVolume  23s   kubelet, k8s-linuxpool1-40233731-0  MountVolume.SetUp succeeded for volume "default-token-14lt1"
  Normal  Pulling                22s   kubelet, k8s-linuxpool1-40233731-0  pulling image "nginx"
  Normal  Pulled                 18s   kubelet, k8s-linuxpool1-40233731-0  Successfully pulled image "nginx"
  Normal  Created                17s   kubelet, k8s-linuxpool1-40233731-0  Created container
  Normal  Started                17s   kubelet, k8s-linuxpool1-40233731-0  Started container
jwalker343 commented 6 years ago

I regressed and used acs-engine version v0.9.1 and I'm at least able to run pods

./acs-engine version
Version: v0.9.1
GitCommit: f9d0e574
GitTreeState: clean
chweidling commented 6 years ago

I tested the custom VNET for Windows nodes with the snapshot 12d7fc5a8143866f34c3c6a21b003eb96960b68b from 2018-01-09. I patched the generated azuredeploy.json file, so that it contains a variable subnet with the value of the subnet range of my custom VNET like this:

"subnet": "10.1.0.0/16"

The deployment worked and the Windows nodes were successfully created. I could deploy Windows containers. But inside the Windows pods, there was a problem with DNS: I could not resolve domain names, that is, I could not reach services inside my cluster.

jwalker343 commented 6 years ago

@chweidling I have not rebuilt a cluster using the latest snapshot, however according to https://github.com/Azure/acs-engine/issues/558#issuecomment-350348512 you may just have to wait a little while and DNS may resolve itself?

chweidling commented 6 years ago

The problem does not disapper even after one hour waiting.

pushkar-bitwise commented 6 years ago

Hi @jwalker343 , we are facing similar issue, are you able to find root cause or solution for same.

SachinL9 commented 5 years ago

I am facing problems deploying a hybrid cluster in custom vnet.

Error: The template parameter 'masterSubnet' is not found. acs-engine version: v0.21.2 k8s version: 1.11

Any idea when support for "Hybrid Cluster with Custom Vnet" will be added?

jwalker343 commented 5 years ago

I was able to successfully deploy a Hybrid cluster in a custom vnet with 0.25.3. This is an old issue and is almost 1year old, so I'm marking it closed.