aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

EKS Windows Nodes #69

Closed ofiliz closed 5 years ago

ofiliz commented 5 years ago

EKS Windows worker nodes to run Windows containers.

Update – 10/8/2019 Amazon EKS now fully supports Windows containers and Windows worker nodes. https://github.com/aws/containers-roadmap/issues/69#issuecomment-539641916

Get started by looking at the EKS documentation: https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html

vicpada commented 5 years ago

Hi, is there any ETA on this issue?

ofiliz commented 5 years ago

We are targeting a public beta in early 2019. Windows Server Containers is a beta feature in Kubernetes and we intend to support Windows following the same guidelines.

Please +1 this issue and tell us what you'd like to see (your preferred Windows Server version, Kubernetes version, features...) so we can plan accordingly! :)

tonysneed commented 5 years ago

Would like to see support for Windows Server 2019, which brings Windows containers much closer to feature parity with Linux containers. See http://stefanscherer.github.io/docker-on-windows-server-2019/.

vicpada commented 5 years ago

Would like to see support for Windows Server 2019, which brings Windows containers much closer to feature parity with Linux containers. See http://stefanscherer.github.io/docker-on-windows-server-2019/.

Also AWS announced recently that WS2019 is supported: https://aws.amazon.com/about-aws/whats-new/2018/11/Windows-Server-1809/

jsamuel1 commented 5 years ago

Windows Server 2019 containers on a "current" (1.12/1.13) version of kubernetes would be great. Looks like 1.14 may have WS2019 as a minimum according to Kubernetes Sig Windows notes: https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit#

csdhome commented 5 years ago

Windows Server 2019 containers on a "current" (1.12/1.13) version of kubernetes would be great. Looks like 1.14 may have WS2019 as a minimum according to Kubernetes Sig Windows notes: https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit#

I agree, we are looking to use Windows containers largely for a CI/CD workload on Kubernetes and would love to see this in place for EKS rather than having to manage our own K8s cluster.

stsukrov commented 5 years ago

Agree. Need Windows containers for CI/CD.

trimbleAdam commented 5 years ago

This is strongly desired.

netlancer2012 commented 5 years ago

windows server 2019 with kubernetes 1.13.+

stsukrov commented 5 years ago

windows server 2019 with kubernetes 1.13.+

Do you mean, you got it working? Or is it just a request?

netlancer2012 commented 5 years ago

It's a request.

coreyjohnston commented 5 years ago

We'd love to see this feature.

msuiche commented 5 years ago

Yes. AFAIK, Azure Container Services does not support hybrid containers either. It would be very interesting to see AWS supporting this before Azure actually.

tabern commented 5 years ago

Hi all, Amazon EKS now supports Windows containers and Windows worker nodes as a public preview.

Learn more and get started here: https://github.com/aws/containers-roadmap/tree/master/preview-programs/eks-windows-preview

Please leave feedback and comments on the preview using this ticket.

mike-mosher commented 5 years ago

Found an issue trying to launch the 'amazon-eks-cfn-quickstart-windows.yaml' template.

There are three nested stacks in this template: 'EKSVPCStack', 'EKSLinuxWorkerStack', 'EKSWindowsWorkerStack'. The first two of these are providing a TemplateURL property pointing to an S3 URL, but the third nested stack (EKSWindowsWorkerStack) is pointing to a github url. Here is the resource:

  EKSWindowsWorkerStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://raw.githubusercontent.com/aws/containers-roadmap/master/preview-programs/eks-windows-preview/amazon-eks-windows-nodegroup.yaml
      ...

The documentation states that these URLs can only be S3 URLs. This causes the stack to fail and roll back with the following error:

    CREATE_FAILED   AWS::CloudFormation::Stack  EKSWindowsWorkerStack   TemplateURL must be an Amazon S3 URL.

I have verified that the stack can create successfully if that template is put in an S3 bucket and the TemplateURL property is replaced with this S3 URL.

Here is the quick create link I used to launch the stack so that this issue can be reproduced (just need to replace '\<keyname>' with a valid keypair name):

https://us-west-2.console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/create/review?filter=active&templateURL=https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fcf-templates-2nak5ih76ymi-us-west-2%2F2019087WcT-test.yml&stackName=test-eks-windows&param_ClusterName=test-eks-windows&param_LinuxNodeImageId=ami-0ed0fe5ff74520950&param_WindowsNodeAutoScalingGroupDesiredCapacity=3&param_WindowsNodeAutoScalingGroupMaxSize=4&param_WindowsNodeAutoScalingGroupMinSize=1&param_WindowsNodeImageId=ami-047f9f0be88cb9b8b&param_WindowsNodeInstanceType=m5a.large&param_KeyName=<keyname>
tabern commented 5 years ago

@mike-mosher good call out - this was not right. I just updated the readme instructions to document how to download the YAML and upload to S3 so this works. We’re in the process of getting this into our service S3 buckets to simplify the setup as well.

cdenneen commented 5 years ago

Opened #227 as the Windows example couldn’t get to run and few kube-system DS won’t run.

tabern commented 5 years ago

We’ve added the windows-nodegroup and QuickStart YAML files to our production S3 buckets and updated the readme for provisioning the Windows worker nodes. This simplifies the setup process.

cdenneen commented 5 years ago

Any 1.12 ami’s available for the README?

tabern commented 5 years ago

@cdenneen we're working on making Windows AMIs for v1.12 available

cdenneen commented 5 years ago

Currently when I create a LoadBalancer service it adds the Windows nodes to the ELB even though I have specified to only deploy service to linux with nodeSelector is there any other annotation to not include the Windows nodes in LoadBalancer's that are created?

vsiddharth commented 5 years ago

This problem is trickier than it appears and I don't think there is an easy way to achieve this today. Please take a look at https://github.com/kubernetes/kubernetes/issues/45234 for additional information.

https://kubernetes.io/docs/concepts/services-networking/service/ should provide a list of relevant annotations.

JasonChinsen commented 5 years ago

@tabern any ideas when the windows AMI's for v1.12.x will be available?

cmanikandan commented 5 years ago

typos in the three urls in step 4, the urls are missing "master" - the corrects urls should be :

curl -o webhook-create-signed-cert.sh https://raw.githubusercontent.com/aws/containers-roadmap/master/preview-programs/eks-windows-preview/webhook-create-signed-cert.sh curl -o webhook-patch-ca-bundle.sh https://raw.githubusercontent.com/aws/containers-roadmap/master/preview-programs/eks-windows-preview/webhook-patch-ca-bundle.sh curl -o vpc-admission-webhook-deployment.yaml https://raw.githubusercontent.com/aws/containers-roadmap/master/preview-programs/eks-windows-preview/vpc-admission-webhook-deployment.yaml

cdenneen commented 5 years ago

@tabern can you update the README.md per @cmanikandan suggestion. Any updates on 1.12.x AMI's? maybe even 1.13/1.14 coming and waiting for that?

vsiddharth commented 5 years ago

@cdenneen Is there anything particular you are looking for in the 1.12.x worker AMIs?

cdenneen commented 5 years ago

@vsiddharth just looking to use latest EKS cluster of 1.12 and have nodes matching instead of windows NG at 1.11

JasonChinsen commented 5 years ago

I am looking for k8s 1.13 support so that I can run Argo-workflow on windows (https://github.com/aws/containers-roadmap/issues/245)

ghost commented 5 years ago

It seems Windows container can't find services in the same namespace. It it intended? C:> nslookup web.default.svc.cluster.local ==> Succeeds C:> nslookup web ==> Says "*** kube-dns.kube-system.svc.cluster.local can't find web: Non-existent domain".

I used the AMI 'ami-0c4de1c5133449009' for nodes and 'mcr.microsoft.com/windows/servercore:1809' image for containers. I had to set DNS suffix manually to work well.

rommelandrea commented 5 years ago

With the default CloudFormation template there is an issue with disk space. It's possible to attach disk on windows and linux nodes?

ghost commented 5 years ago

This is not about attaching disks to nodes, I tried EBS and AzureFile volumes for Windows PODs and ran into problems. Mounting EBS volumes to Windows POD failed with timeout. And mounting SMB(AzureFile) to Windows POD frequently fails. Errors are

"MountVolume.SetUp failed for volume "data" : azureMount: SmbGlobalMapping failed: exit status 1, only SMB mount is supported now, output: "New-SmbGlobalMapping : Generic failure \r\nAt line:1 char:190"

These volume types work well for Linux PODs if I run sudo yum install cifs-utils in nodes.

cdenneen commented 5 years ago

@tabern @vsiddharth is there any plan to release new windows images?

Is there anyway to update the build scripts to support rolling our own window image? https://github.com/awslabs/amazon-eks-ami

bcmedeiros commented 5 years ago

Any guidelines to add Windows worker nodes to an already existing cluster? I have to run some windows workload, but I need to run it together with my other containers in the same cluster.

anjanitsip commented 5 years ago

I am getting below error User is not authorized to perform: iam:CreatePolicy on resource: policy test-on-cloud-cluster-UnassignPrivateIpAddresses

anjanitsip commented 5 years ago

I am build the cluster but while running window-server-iis application, I am getting n/w related error D:\kubernete>kubectl get pods NAME READY STATUS RESTARTS AGE windows-server-iis-7dcfc7c79b-4z4v7 0/1 ContainerCreating 0 31m

D:\kubernetes>kubectl describe pod windows-server-iis-7dcfc7c79b-4z4v7 Name: windows-server-iis-7dcfc7c79b-4z4v7 Namespace: default Priority: 0 PriorityClassName: Node: ip-192-168-196-65.ec2.internal/192.168.196.65 Start Time: Tue, 25 Jun 2019 14:33:23 +0530 Labels: app=windows-server-iis pod-template-hash=3879737356 tier=backend track=stable Annotations: Status: Pending IP: Controlled By: ReplicaSet/windows-server-iis-7dcfc7c79b Containers: windows-server-iis: Container ID: Image: mcr.microsoft.com/windows/servercore:1809 Image ID: Port: 80/TCP Host Port: 0/TCP Command: powershell.exe -command Add-WindowsFeature Web-Server; Invoke-WebRequest -UseBasicParsing -Uri State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-54jzq (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-54jzq: Type: Secret (a volume populated by a Secret) SecretName: default-token-54jzq Optional: false QoS Class: BestEffort Node-Selectors: beta.kubernetes.io/os=windows Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Normal Scheduled 31m default-scheduler Successfully assigned default/windows-server-iis-7dcfc7c79b-4z4v7 to ip-192-168-196-65.ec2.internal Warning FailedCreatePodSandBox 31m kubelet, ip-192-168-196-65.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "ae0125f61a163e344b3ce43d96203b8ba32d0f6029b3b0f0d2c27eb4df250652" network for pod "windows-server-iis-7dcfc7c79b-4z4v7": NetworkPlugin cni failed to set up pod "windows-server-iis-7dcfc7c79b-4z4v7_default" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address, failed to clean up sandbox container "ae0125f61a163e344b3ce43d96203b8ba32d0f6029b3b0f0d2c27eb4df250652" network for pod "windows-server-iis-7dcfc7c79b-4z4v7": NetworkPlugin cni failed to teardown pod "windows-server-iis-7dcfc7c79b-4z4v7_default" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address] Normal SandboxChanged 1m (x117 over 31m) kubelet, ip-192-168-196-65.ec2.internal Pod sandbox changed, it will be killed and re-created.

anjanitsip commented 5 years ago

[failed to set up sandbox container "ae0125f61a163e344b3ce43d96203b8ba32d0f6029b3b0f0d2c27eb4df250652" network for pod "windows-server-iis-7dcfc7c79b-4z4v7": NetworkPlugin cni failed to set up pod "windows-server-iis-7dcfc7c79b-4z4v7_default" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address, failed to clean up sandbox container "ae0125f61a163e344b3ce43d96203b8ba32d0f6029b3b0f0d2c27eb4df250652" network for pod "windows-server-iis-7dcfc7c79b-4z4v7": NetworkPlugin cni failed to teardown pod "windows-server-iis-7dcfc7c79b-4z4v7_default" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address]

nigel-decosta-rft commented 5 years ago

I get frequent nw related errors when deploying to EKS Windows nodes. Typical manifestation is that the PODs cannot access ClusterIP addresses within the cluster. They can access POD IP addresses though. To check I run nslookup on the Windows POD and when faulty this will time out attempting to connect to the core-dns ClusterIP.

The work around is to restart the Windows nodes and the vpc-resource-controller. This may resolve the issue temporarily (a few hours at best). More recently I am finding the resolution lasts only a few minutes.

Is anybody else having this problem?

cmboughey commented 5 years ago

To expand upon the previous poster, this is my symptom....

Issue I have seen and cannot overcome, any suggestions would be greatly appreciated.

Once a Window machine has been deployed, let us say I have 5 slots for pods, each with their own IP. The networking appears to be valid, I can consume any ClusterIP which the pod is allowed to utilize, at the moment this is most likely running on Linux. (CoreDNS and Jenkins).

Now after I have reached the deployment limit, I find that I lose the capacity to connect to those ClusterIPs and networking fails for internal communications. I still have external capability.

This can be proven by deleting pods and waiting for them to be recreated, if I try to connect, it fails. No port is available to connect to…. The only solution is to redeploy the EC2 instance again.

Is this a known issue, is there a better work around ?

nigel-decosta-rft commented 5 years ago

@cmboughey - This does sound similar to the issues I have been facing. What exactly do you mean when you say "Once a Window machine has been deployed, let us say I have 5 slots for pods, each with their own IP"?

I have a script which reboots the EC2 instances which does make it a bit easier. Still a pain.

cmboughey commented 5 years ago

@nigeldecosta - Basically, depending on the size of your compute instance, you have a limit on the IPAddresses which will be used for the instance. Part of the CNI configuration, AWS uses elastic network adapters and assigns them to machine. As for the script, that's a pain! It may be usable if you're just deploying pods whcih don't need to be recreated often but is not usable for Jenkins.

nigel-decosta-rft commented 5 years ago

@cmboughey Is this related to the primary + secondary private IPs on the Windows EC2 instances? I am currently using EC2 type m4.16xlarge. Could I expect the limit to be higher on other types? I couldn't see where such limits are listed. If you have a link that would help. Thanks.

cmboughey commented 5 years ago

@nigeldecosta

realrill commented 5 years ago

Hi all. I've had the same issue as @anjanitsip commented . ...network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address

I've added the required label to the windows iis sample yaml with a random IP form the subnet where the nodes are. Also, I have restarted the Windows instance, the vpc-resource-controller and the aws-node DaemonSet too.

It ~solved~ does not solved the issue. See update at the bottom.

VPC-* and aws-node are up, running and healthy. All logs are ok so I don't know where the label or the ip should come from.

vpc-resource-* container log:

I0822 08:02:30.396621       1 ipaddress.go:77] IPAddressProvider initialized instance yyyyyyyyyyyy resource pool {Capacity:5 InUse:map[xxxxxxxxx:node] Warm:[xxxxxxxxx xxxxxxxxx xxxxxxxxx] Pending:0}.
I0822 08:02:30.396826       1 manager.go:190] Node manager advertising resource vpc.amazonaws.com/PrivateIPv4Address quantity 5 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.400701       1 watcher.go:121] Pod watcher cache synced.
I0822 08:02:30.400773       1 manager.go:88] Node manager is starting.
I0822 08:02:30.400787       1 controller.go:155] Controller started.
I0822 08:02:30.400863       1 watcher.go:130] Pod watcher worker 1 started.
I0822 08:02:30.407539       1 manager.go:141] Node manager added node {name:kkkkkkkk.us-east-2.compute.internal instanceID:yyyyyyyyyyyy instanceType:t3.medium os:windows managed:true}.
I0822 08:02:30.407567       1 watcher.go:190] Node watcher completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.407713       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-b894j on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407811       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-linux-785d945579-25287 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407871       1 watcher.go:190] Pod watcher ignoring pod vpc-resource-controller-85c8f9475d-jpgcf on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407885       1 watcher.go:190] Pod watcher ignoring pod aws-node-htjkv on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407942       1 watcher.go:190] Pod watcher ignoring pod kube-proxy-n9jkc on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408001       1 watcher.go:190] Pod watcher ignoring pod vpc-admission-webhook-deployment-67bd7fb7d5-54c9k on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408020       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-jgjtt on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408085       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.408096       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-z2q7h.
I0822 08:02:30.408146       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-windows-75d57fd74c-2jqw2 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:40.401067       1 reconciler.go:30] Node manager reconciler started.
I0822 08:02:40.401118       1 reconciler.go:102] Reconciler worker 1 starting processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401145       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/ENI warmpool size 0 desired 0 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401153       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/PrivateIPv4Address warmpool size 3 desired 3 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401159       1 reconciler.go:106] Reconciler worker 1 completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:05:40.094352       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899838       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899868       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-fn9n6.
I0822 08:24:40.098314       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:32:01.610958       1 watcher.go:247] Pod watcher processing deleted pod aws-node-htjkv on node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:32:31.641403       1 watcher.go:190] Pod watcher ignoring pod aws-node-zlnt7 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:51:55.080317       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:51:55.080341       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-8xj7m.
I0822 08:53:10.102789       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807919       1 watcher.go:194] Pod watcher processing pod windows-server-iis-b4b96d88c-smn64 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807955       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-b4b96d88c-smn64.
I0822 09:14:37.721407       1 watcher.go:194] Pod watcher processing pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 09:14:37.721430       1 watcher.go:236] Pod watcher completed processing pod bash-77ccdf87d9-khxf6.
I0822 09:16:20.106958       1 watcher.go:247] Pod watcher processing deleted pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.

aws-node container log

===== Starting installing AWS-CNI =========
===== Starting amazon-k8s-agent ===========

vpc-admission* container log

I0821 15:32:00.477762       1 main.go:64] Initializing vpc-admission-webhook version beta.
I0821 15:32:00.478603       1 main.go:76] Webhook Server started.

I am more than happy to provide more logs for investigation just give me what you need.

update It does not solve the issue. The windows-server-iis container in crashloopback...

update the container goes to Error state then restart itself. While it's in running mode I am able to exec into but can't see the reason of the Error

update Looks like the manual attached label for IPv4Address does not affect the container networking. I try to investigate as much as I can but slowly run out of ideas. See error below.

kubectl logs windows-server-iis-b4b96d88c-lnc6n

Success Restart Needed Exit Code      Feature Result
------- -------------- ---------      --------------
True    No             Success        {Common HTTP Features, Default Documen...
Invoke-WebRequest : The remote name could not be resolved:
'dotnetbinaries.blob.core.windows.net'
At line:1 char:32
+ ... Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbi ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:Htt
   pWebRequest) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShe
   ll.Commands.InvokeWebRequestCommand

C:\ServiceMonitor.exe : The term 'C:\ServiceMonitor.exe' is not recognized as
the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is
correct and try again.
At line:1 char:311
+ ... ml>' > C:\inetpub\wwwroot\default.html; C:\ServiceMonitor.exe 'w3svc' ...
+                                             ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\ServiceMonitor.exe:String) [
   ], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Few logs from the Windows node: kubelet

E0822 12:52:51.499796    3428 remote_runtime.go:115] StopPodSandbox "98eed946e8662ac4a6fb5f76a24ce1a517e13dcdeefd4fc92ee3e6330abbf7fe" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "windows-server-iis-7fb74d9fc-fn9n6_default" network: failed to parse Kubernetes args: failed to get pod windows-server-iis-7fb74d9fc-fn9n6: pods "windows-server-iis-7fb74d9fc-fn9n6" not found

update Flanneld service is missing from the node. The question now, why and what step missed the installation. upadte https://github.com/aws/containers-roadmap/issues/273 It looks to me, it could be the source of my issue. Someone senior please confirm/decline

final update Solved. It was mostly user error. I mean, I have provisioned the instances within an environment with strict network policies and few port has been blocked.

dcopestake commented 5 years ago

~Should ENIs be dynamically allocated to Windows nodes like they are with Linux nodes?~

The reason I ask is that I've got two nodes in my cluster, one on Linux and one on Windows, however the Windows node only seems to be able to run 5 pods at a time (both instances are t3.medium) whereas the Linux node can handle 17. I can see that the Linux node has 3 ENIs and 18 total private IPs, however the Windows node seems to only have a single ENI and a single private IP.

Update: @vsiddharth kindly responded via email and confirmed that ENIs are in fact not dynamically allocated for the Windows nodes at the moment.

rparsonsbb commented 5 years ago

Is there an ETA on windows nodes for 1.14?

dcopestake commented 5 years ago

Submitted pull request #453 - which adds versions of the quickstart shell scripts written in PowerShell - just in case anyone wanted to get going with the preview but didn't have access to a bash shell.

smiron commented 5 years ago

Hi all. I've had the same issue as @anjanitsip commented . ...network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address

I've added the required label to the windows iis sample yaml with a random IP form the subnet where the nodes are. Also, I have restarted the Windows instance, the vpc-resource-controller and the aws-node DaemonSet too.

It ~solved~ does not solved the issue. See update at the bottom.

VPC-* and aws-node are up, running and healthy. All logs are ok so I don't know where the label or the ip should come from.

vpc-resource-* container log:

I0822 08:02:30.396621       1 ipaddress.go:77] IPAddressProvider initialized instance yyyyyyyyyyyy resource pool {Capacity:5 InUse:map[xxxxxxxxx:node] Warm:[xxxxxxxxx xxxxxxxxx xxxxxxxxx] Pending:0}.
I0822 08:02:30.396826       1 manager.go:190] Node manager advertising resource vpc.amazonaws.com/PrivateIPv4Address quantity 5 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.400701       1 watcher.go:121] Pod watcher cache synced.
I0822 08:02:30.400773       1 manager.go:88] Node manager is starting.
I0822 08:02:30.400787       1 controller.go:155] Controller started.
I0822 08:02:30.400863       1 watcher.go:130] Pod watcher worker 1 started.
I0822 08:02:30.407539       1 manager.go:141] Node manager added node {name:kkkkkkkk.us-east-2.compute.internal instanceID:yyyyyyyyyyyy instanceType:t3.medium os:windows managed:true}.
I0822 08:02:30.407567       1 watcher.go:190] Node watcher completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.407713       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-b894j on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407811       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-linux-785d945579-25287 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407871       1 watcher.go:190] Pod watcher ignoring pod vpc-resource-controller-85c8f9475d-jpgcf on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407885       1 watcher.go:190] Pod watcher ignoring pod aws-node-htjkv on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407942       1 watcher.go:190] Pod watcher ignoring pod kube-proxy-n9jkc on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408001       1 watcher.go:190] Pod watcher ignoring pod vpc-admission-webhook-deployment-67bd7fb7d5-54c9k on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408020       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-jgjtt on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408085       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.408096       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-z2q7h.
I0822 08:02:30.408146       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-windows-75d57fd74c-2jqw2 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:40.401067       1 reconciler.go:30] Node manager reconciler started.
I0822 08:02:40.401118       1 reconciler.go:102] Reconciler worker 1 starting processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401145       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/ENI warmpool size 0 desired 0 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401153       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/PrivateIPv4Address warmpool size 3 desired 3 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401159       1 reconciler.go:106] Reconciler worker 1 completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:05:40.094352       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899838       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899868       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-fn9n6.
I0822 08:24:40.098314       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:32:01.610958       1 watcher.go:247] Pod watcher processing deleted pod aws-node-htjkv on node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:32:31.641403       1 watcher.go:190] Pod watcher ignoring pod aws-node-zlnt7 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:51:55.080317       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:51:55.080341       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-8xj7m.
I0822 08:53:10.102789       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807919       1 watcher.go:194] Pod watcher processing pod windows-server-iis-b4b96d88c-smn64 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807955       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-b4b96d88c-smn64.
I0822 09:14:37.721407       1 watcher.go:194] Pod watcher processing pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 09:14:37.721430       1 watcher.go:236] Pod watcher completed processing pod bash-77ccdf87d9-khxf6.
I0822 09:16:20.106958       1 watcher.go:247] Pod watcher processing deleted pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.

aws-node container log

===== Starting installing AWS-CNI =========
===== Starting amazon-k8s-agent ===========

vpc-admission* container log

I0821 15:32:00.477762       1 main.go:64] Initializing vpc-admission-webhook version beta.
I0821 15:32:00.478603       1 main.go:76] Webhook Server started.

I am more than happy to provide more logs for investigation just give me what you need.

update It does not solve the issue. The windows-server-iis container in crashloopback...

update the container goes to Error state then restart itself. While it's in running mode I am able to exec into but can't see the reason of the Error

update Looks like the manual attached label for IPv4Address does not affect the container networking. I try to investigate as much as I can but slowly run out of ideas. See error below.

kubectl logs windows-server-iis-b4b96d88c-lnc6n

Success Restart Needed Exit Code      Feature Result
------- -------------- ---------      --------------
True    No             Success        {Common HTTP Features, Default Documen...
Invoke-WebRequest : The remote name could not be resolved:
'dotnetbinaries.blob.core.windows.net'
At line:1 char:32
+ ... Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbi ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:Htt
   pWebRequest) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShe
   ll.Commands.InvokeWebRequestCommand

C:\ServiceMonitor.exe : The term 'C:\ServiceMonitor.exe' is not recognized as
the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is
correct and try again.
At line:1 char:311
+ ... ml>' > C:\inetpub\wwwroot\default.html; C:\ServiceMonitor.exe 'w3svc' ...
+                                             ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\ServiceMonitor.exe:String) [
   ], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Few logs from the Windows node: kubelet

E0822 12:52:51.499796    3428 remote_runtime.go:115] StopPodSandbox "98eed946e8662ac4a6fb5f76a24ce1a517e13dcdeefd4fc92ee3e6330abbf7fe" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "windows-server-iis-7fb74d9fc-fn9n6_default" network: failed to parse Kubernetes args: failed to get pod windows-server-iis-7fb74d9fc-fn9n6: pods "windows-server-iis-7fb74d9fc-fn9n6" not found

update Flanneld service is missing from the node. The question now, why and what step missed the installation. upadte #273 It looks to me, it could be the source of my issue. Someone senior please confirm/decline

final update Solved. It was mostly user error. I mean, I have provisioned the instances within an environment with strict network policies and few port has been blocked.

How did u solve it in the end? Please share.

realrill commented 5 years ago

Hi all. I've had the same issue as @anjanitsip commented . ...network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address I've added the required label to the windows iis sample yaml with a random IP form the subnet where the nodes are. Also, I have restarted the Windows instance, the vpc-resource-controller and the aws-node DaemonSet too. It ~solved~ does not solved the issue. See update at the bottom. VPC- and aws-node are up, running and healthy. All logs are ok so I don't know where the label or the ip should come from. vpc-resource- container log:

I0822 08:02:30.396621       1 ipaddress.go:77] IPAddressProvider initialized instance yyyyyyyyyyyy resource pool {Capacity:5 InUse:map[xxxxxxxxx:node] Warm:[xxxxxxxxx xxxxxxxxx xxxxxxxxx] Pending:0}.
I0822 08:02:30.396826       1 manager.go:190] Node manager advertising resource vpc.amazonaws.com/PrivateIPv4Address quantity 5 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.400701       1 watcher.go:121] Pod watcher cache synced.
I0822 08:02:30.400773       1 manager.go:88] Node manager is starting.
I0822 08:02:30.400787       1 controller.go:155] Controller started.
I0822 08:02:30.400863       1 watcher.go:130] Pod watcher worker 1 started.
I0822 08:02:30.407539       1 manager.go:141] Node manager added node {name:kkkkkkkk.us-east-2.compute.internal instanceID:yyyyyyyyyyyy instanceType:t3.medium os:windows managed:true}.
I0822 08:02:30.407567       1 watcher.go:190] Node watcher completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.407713       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-b894j on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407811       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-linux-785d945579-25287 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407871       1 watcher.go:190] Pod watcher ignoring pod vpc-resource-controller-85c8f9475d-jpgcf on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407885       1 watcher.go:190] Pod watcher ignoring pod aws-node-htjkv on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407942       1 watcher.go:190] Pod watcher ignoring pod kube-proxy-n9jkc on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408001       1 watcher.go:190] Pod watcher ignoring pod vpc-admission-webhook-deployment-67bd7fb7d5-54c9k on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408020       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-jgjtt on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408085       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.408096       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-z2q7h.
I0822 08:02:30.408146       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-windows-75d57fd74c-2jqw2 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:40.401067       1 reconciler.go:30] Node manager reconciler started.
I0822 08:02:40.401118       1 reconciler.go:102] Reconciler worker 1 starting processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401145       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/ENI warmpool size 0 desired 0 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401153       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/PrivateIPv4Address warmpool size 3 desired 3 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401159       1 reconciler.go:106] Reconciler worker 1 completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:05:40.094352       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899838       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899868       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-fn9n6.
I0822 08:24:40.098314       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:32:01.610958       1 watcher.go:247] Pod watcher processing deleted pod aws-node-htjkv on node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:32:31.641403       1 watcher.go:190] Pod watcher ignoring pod aws-node-zlnt7 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:51:55.080317       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:51:55.080341       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-8xj7m.
I0822 08:53:10.102789       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807919       1 watcher.go:194] Pod watcher processing pod windows-server-iis-b4b96d88c-smn64 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807955       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-b4b96d88c-smn64.
I0822 09:14:37.721407       1 watcher.go:194] Pod watcher processing pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 09:14:37.721430       1 watcher.go:236] Pod watcher completed processing pod bash-77ccdf87d9-khxf6.
I0822 09:16:20.106958       1 watcher.go:247] Pod watcher processing deleted pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.

aws-node container log

===== Starting installing AWS-CNI =========
===== Starting amazon-k8s-agent ===========

vpc-admission* container log

I0821 15:32:00.477762       1 main.go:64] Initializing vpc-admission-webhook version beta.
I0821 15:32:00.478603       1 main.go:76] Webhook Server started.

I am more than happy to provide more logs for investigation just give me what you need. update It does not solve the issue. The windows-server-iis container in crashloopback... update the container goes to Error state then restart itself. While it's in running mode I am able to exec into but can't see the reason of the Error update Looks like the manual attached label for IPv4Address does not affect the container networking. I try to investigate as much as I can but slowly run out of ideas. See error below.

kubectl logs windows-server-iis-b4b96d88c-lnc6n

Success Restart Needed Exit Code      Feature Result
------- -------------- ---------      --------------
True    No             Success        {Common HTTP Features, Default Documen...
Invoke-WebRequest : The remote name could not be resolved:
'dotnetbinaries.blob.core.windows.net'
At line:1 char:32
+ ... Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbi ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:Htt
   pWebRequest) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShe
   ll.Commands.InvokeWebRequestCommand

C:\ServiceMonitor.exe : The term 'C:\ServiceMonitor.exe' is not recognized as
the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is
correct and try again.
At line:1 char:311
+ ... ml>' > C:\inetpub\wwwroot\default.html; C:\ServiceMonitor.exe 'w3svc' ...
+                                             ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\ServiceMonitor.exe:String) [
   ], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Few logs from the Windows node: kubelet

E0822 12:52:51.499796    3428 remote_runtime.go:115] StopPodSandbox "98eed946e8662ac4a6fb5f76a24ce1a517e13dcdeefd4fc92ee3e6330abbf7fe" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "windows-server-iis-7fb74d9fc-fn9n6_default" network: failed to parse Kubernetes args: failed to get pod windows-server-iis-7fb74d9fc-fn9n6: pods "windows-server-iis-7fb74d9fc-fn9n6" not found

update Flanneld service is missing from the node. The question now, why and what step missed the installation. upadte #273 It looks to me, it could be the source of my issue. Someone senior please confirm/decline final update Solved. It was mostly user error. I mean, I have provisioned the instances within an environment with strict network policies and few port has been blocked.

How did u solve it in the end? Please share.

IIRC 443 port has been blocked that caused malfunction on the Windows woker/pod side.

dcopestake commented 5 years ago

Is the EKS Windows preview still actually running/being developed? There doesn't seem to be a huge amount of activity here (other than people raising issues) and today I got an email from AWS saying that 1.11 is going to be deprecated in early November, making it impossible (presumably) to actually run a 1.11 cluster with Windows nodegroups, so not sure what the plan is?

nigel-decosta-rft commented 5 years ago

I thought the public release of Windows EKS was imminent. Not pleased that 1.11 will be deprecated before we get a supported version of Windows EKS.