Open bremnes opened 7 years ago
@JiangtianLi can you speak to this?
I did some more testing today and found that the 1709-images were working, but images without it crashed with the exception shown above. So even images like microsoft/aci-helloworld:windows isn't working anymore.
I would like there to be a possibility of choosing node pool(s) os/edition. As it is now we aren't really getting stable infrastructure if suddenly the configuration vary wildly from environment to environment based on when you accidentally happened to create the cluster. Like now for instance we have to make sure that everybody got fall creators update in addition to do some changes to all the dockerfiles. As for the VSTS hosted build agent? I have no idea if it supports 1709 layers, so that might be another thing we have to adjust for in the wake of this.
The remoting experience I had matches with this blog post, so for sure the agent os/edition has changed: https://blogs.msdn.microsoft.com/freddyk/2017/11/01/1709-and-nav-on-docker/
Attempting to change the base image to microsoft/aspnetcore:2.0-nanoserver-1709
on one of our dockerfiles didn't work out as we now get an error complaining about access being denied to a file (same as when running the base image itself, see below).
To narrow it down, I created a fresh standalone VM from the market place based on the "Windows Server, version 1709 with Containers" template to compare with one of the Kubernetes agents. I remoted into them both to run pure docker commands seeing what could be wrong.
According to docker version
and OS Name
and OS Version
from systeminfo | findstr /C:"OS"
they apparently seem to have the same configuration.
On Standalone VM these images work:
docker run microsoft/aspnetcore:2.0-nanoserver-1709
docker run microsoft/iis:windowsservercore-1709
Doesn't work, but is from what I understand expected given the 1709 update:
docker run microsoft/aspnetcore:2.0
Error response from daemon: container 64e00888b063e10f59841fb3ff68a321199a6e4cb6f73a224b7b9dd2b3340208 encountered an error during CreateContainer: failure in a Windows system call: The operating system of the container does not match the operating system of the host.
Kubernetes Windows node works:
docker run microsoft/iis:windowsservercore-1709
Doesn't work:
docker run microsoft/aspnetcore:2.0-nanoserver-1709
docker: failed to register layer: re-exec error: exit status 1: output: remove \?\C:\ProgramData\docker\windowsfilter\eecc1639c6223893c5fef33bcc29aae8f969ed7acd1b3f45f51a765d3ba494fd\UtilityVM\Files\Windows\System32\diagtrack.dll: Access is denied.
docker run microsoft/aspnetcore:2.0
(same as for Standalone)
Am I missing something here? If anybody could tell me where I've been making an error it would be great - either when creating the cluster or making a wrong assumption in the debugging session shown above. Alternatively if anybody is able to confirm and/or reproduce the bug. As it is now we aren't able to use ACS Kubernetes, which is quite unfortunate in my opinion.
(This question was originally targeting documentation. As windows containers is in preview it's an implicit contract that things might change. But it would be nice if large changes like changing agent nodes operating system edition were documented somehow. With the things mentioned in this comment, I'm thinking that there might be a bug.)
@bremnes You are right. ACS Engine has switched to RS3 Windows, which uses 1709 and Server Core. The documentation is not yet fully up-to-date and I am working on that. In general, the only change would be to use container image with 1709 tag, which requires the current workload to be built/upgraded/refactored to 1709 image.
As to microsoft/aspnetcore:2.0-nanoserver-1709, I can docker run -it microsoft/aspnetcore:2.0-nanoserver-1709 cmd
on a 1.8.2 k8s windows cluster without issue. Which version of k8s are you using?
@JiangtianLi From the cluster created through ARM template:
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.9", GitCommit:"19fe91923d584c30bd6db5c5a21e9f0d5f742de8", GitTreeState:"clean", BuildDate:"2017-10-19T16:55:06Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Same for another cluster which was created through the Portal. It's not possible to select version through the Portal wizard it seems like. Or?
Just to make sure we are on the same page, this is ACS and not ACS-engine. How about your 1.8.2 cluster? I'm not sure how the ACS and ACS-engine is synched on the k8s version or if they are "gated" to the ACS project.
@bremnes I am using ACS-Engine so I can choose version. Portal uses default version. I will need to try ACS.
@JiangtianLi Just FYI, I opened another issue involved in the creation of the ACS cluster - #89. Not sure if that could've had some impact on the (mis)configuration of the agent nodes in this issue.
@bremnes I created a Windows k8s cluster from Azure portal in UKWest. I deployed a workload with yaml from https://raw.githubusercontent.com/JiangtianLi/Examples/master/windows/basic/simpleweb.yaml and container was successfully created. Then I used microsoft/aspnetcore:2.0-nanoserver-1709 image and had the same error as yours. I also tried
docker run -it microsoft/aspnetcore:2.0-nanoserver-1709 cmd
and it failed.
It appears that microsoft/aspnetcore:2.0-nanoserver-1709 container image has some problem on RS3.
@PatrickLang Is this a known issue?
@JiangtianLi Can you help clarify? I'm confused because you said it was working in https://github.com/Azure/ACS/issues/88#issuecomment-346084185
Did you change something and now you're getting this error (copied from @bremnes ) ?
Doesn't work:
docker run microsoft/aspnetcore:2.0-nanoserver-1709
docker: failed to register layer: re-exec error: exit status 1: output: remove \?\C:\ProgramData\docker\windowsfilter\eecc1639c6223893c5fef33bcc29aae8f969ed7acd1b3f45f51a765d3ba494fd\UtilityVM\Files\Windows\System32\diagtrack.dll: Access is denied.
@PatrickLang Yes, docker run microsoft/aspnetcore:2.0-nanoserver-1709
worked in the windows node created by ACS-Engine before. But in the new ACS windows node I just created, it didn't work and has the same error as @bremnes. The difference I can think of:
I'll need to re-create ACS-Engine cluster with the same parameters but I am guessing it is the container image has some conflict with the customized setup on k8s windows node.
@PatrickLang I created another ACS cluster because I deleted the cluster 2 hours ago. However, docker run -it microsoft/aspnetcore:2.0-nanoserver-1709 cmd
succeeded this time. Seems the issue is random. I'll keep the current cluster and create another one to see if it repro.
@PatrickLang I created another ACS cluster, the same VM, the same VM, but I couldn't repro the issue. The image appears to be recently updated:
microsoft/aspnetcore 2.0-nanoserver-1709 8a080e5ebae7 16 hours ago
@bremnes Can you repro the issue now?
@JiangtianLi I just tried a new cluster in UK West and was able to reproduce the diagtrack.dll error. ACS Kubernetes cluster with 1 master and 1 windows agent (DS2_V2_Standard).
Just created (with acs-engine v0.9.4) a kubernetes 1.8.2 cluster in westeurope. Turns out that our microsoft/windowsservercore based containers that used to work on a Kubernetes 1.6 now fail to deploy on the new Kubernetes 1.8.2 Windows nodes with the same error message as above:
The operating system of the container does not match the operating system of the host
While checking, I found out that Kubernetes has an issue finding out what Docker version is used on the Windows nodes: look here
I'm worried by the "docker://Unknown" mentions.
Also note that Windows Kernel versions were unknown in Kubernetes 1.6.6 without any issue.
@odauby with acs-engine v0.9.4, the Windows node uses RS3 and is compatible with microsoft/windowsservercore:1709 container image (https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility).
@JiangtianLi @PatrickLang, have you guys had any progress in regards to the nanoserver-1709 error? If it's random I can try again until I hit fortune, but the 7-8 clusters I've tried so far haven't worked.
@bremnes From my side, the error does not consistently repro so I will need to get a failed cluster. Does this repro with other nanoserver images? Is it possible for your to share out the cluster or collect trace for us? @PatrickLang Do you know what HCS trace or else should be collected?
@JiangtianLi I tested now by creating 6 clusters and they all fail on pulling the nanonserver image with the diagtrack.dll error. See this gist for the script I used.
We created a support ticket through Azure where your colleague confirmed that he was able to reproduce it as well (#117112717221074). It might be easier to go through him, but if you want to you can have all the cluster information for one of the clusters we created now (deleted the rest) - just let me know where you want the information.
@bremnes As discussed in another thread, this issue appears to be a race condition when pulling two nanoserver images at the same time. I have looped in the container folks for further solution.
Darren's still working on a fix, will be sending PR to moby/moby
Hi all - is there any update for this? We're having this issue as of today. jakkaj/aspnanottest is a public Docker hub image that can replicate this. Cluster created with only one Windows node (no Linux nodes).
acs-engine v0.15.2
kubelet, 51742k8s9000 Failed to pull image "jakkaj/aspnanottest": rpc error: code = Unknown desc = failed to register layer: re-exec error: exit status 1: output: remove \?\C:\ProgramData\docker\windowsfilter\8be6cb4949d1271d5145ca143874a1ef1254dbd3394af7723bd56f64fd5a791f\UtilityVM\Files\Windows\System32\NetSetupApi.dll: Access is denied.
@jakkaj the access denied issue is tracked here: https://github.com/moby/moby/issues/36092 , fix in the works for Docker-EE within a few weeks
Is this a request for help?: Yes, documentation
Context Created a new Kubernetes cluster/environment. The same docker images that works in our current/old environment doesn't work in the new and gives us the following error:
The images have been built towards the aspnetcore:2.0 for windows/nano server.
When we created our cluster ~45 days ago we got regular Windows 2016 datacenter edition virtual machines as agent nodes. This was confirmed by remoting in to them now and seeing a full desktop experience. Remoting to the agents in the new cluster gives us a command prompt. Not sure if it's maybe server core?
Is there any documentation/update log available explaining what's been done behind the scenes the ACS? And can we specify os image when creating a new cluster via portal, cli or ARM template? Is this related to the 1709/fall creators update?
Old nodes: OS Name: Microsoft Windows Server 2016 Datacenter OS Version: 10.0.14393 N/A Build 14393
New nodes: OS Name: Microsoft Windows Server Datacenter OS Version: 10.0.16299 N/A Build 16299