Closed stevenmiller closed 5 years ago
@stevenmiller first of I can see that this is your first issue on GitHub so let me say welcome :)
Then some side comments (which you don't need to comment but maybe it is good idea to check them out):
Then what comes to this issue, IMO your stack file looks to be just like documentation says that how it should work but in real life it looks that Windows implementation of overlay network / components used behind it (example: hcsshim) have some undocumented weaknesses.
@olljanat Thanks for the tips! We are revising the apps to run what we can on linux containers. At least one service requires Windows so we are stuck with it for the time being. Is there any further troubleshooting you can advise with this current setup or is likely just an issue we will have to wait out?
Just as a note: with my org's Microsoft SA agreement we get support, so I have a technical support case open regarding this issue. I will update with the resolution (if one is found).
At least one service requires Windows so we are stuck with it for the time being. Is there any further troubleshooting you can advise with this current setup or is likely just an issue we will have to wait out?
Depending on your application architecture (which apps need to be able to talk which apps) you can example place Windows and Linux containers to same overlay network so they can talk with each others inside of it.
Just as a note: with my org's Microsoft SA agreement we get support, so I have a technical support case open regarding this issue. I will update with the resolution (if one is found).
OK. It is really interesting to hear if they are solve this issue as my earlier experiences of level of Microsoft support have not been too positive.
Here's the final email from MS support. Unfortunately no resolution at this time, however we did find that if you deploy a container and expose the default port (say 80 for IIS) it is reachable from both nodes without a problem. It seems to be an issue with the port translation between hosts. :
lab setup
PS C:\Users\Administrator> docker version Client: Version: 18.09.0 API version: 1.39 Go version: go1.10.3 Git commit: 33a45cd0a2 Built: unknown-buildtime OS/Arch: windows/amd64 Experimental: false
// create overlay network PS C:\Users\TEMP> docker network create --driver=overlay overlay-nw vqixaimowmg9ch6byz089jmph
PS C:\Users\TEMP> docker network ls NETWORK ID NAME DRIVER SCOPE 08ick3kig81k ingress overlay swarm 4e5485733a1d nat nat local 8ab446910e24 none null local vqixaimowmg9 overlay-nw overlay swarm
// Init Swarm on manager node PS C:\Users\TEMP> docker swarm init --advertise-addr=10.168.179.56 --listen-addr 10.168.179.56:2377
// Join a swarm as a work on Windows worker node PS C:\Users\Administrator.WUXI> docker swarm join --token SWMTKN-1-5lazzfmd1wyft4egtnzope30i1n744qlw99bj79qtd4ccdv2hg-b4j4fkki75ud34c9n5fvmnffw 10.168.179.56:2377
// Join a swarm as a work on Linux worker node admin@ubuntu-6011:~$ docker swarm join --token SWMTKN-1-5lazzfmd1wyft4egtnzope30i1n744qlw99bj79qtd4ccdv2hg-b4j4fkki75ud34c9n5fvmnffw 10.168.179.56:2377
PS C:\Users\TEMP> docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION e99kxpcgwhkgtte7dwwko5pl0 * WS2019 Ready Active Leader 18.09.0 whjthcivqkvpj5oz45w6xg3de WS2019-B Ready Active 18.09.0 7mpt4wn2ioi4atmtmkxdu7q6u ubuntu-6011 Ready Active 18.09.0
// create service on master node docker service create --name=test --endpoint-mode vip --network=overlay-nw microsoft/iis cmd docker service update --publish-add published=8080,target=80 test
Problem repro and analysis
The test result shows the similar packet flow with the customer packet. The problem is that Linux node didn’t translate the published port 8080 to target port 80.
Failed Packet flow: Client -> Linux 10.168.179.42:8080[10.255.0.4] -> Windows 10.168.179.56:8080 [10.255.0.3:8080]
Success Packet flow: Client -> Windows 10.168.179.55:8080[10.255.0.3] -> Windows 10.168.179.56:80 [10.255.0.3:80]
Summary
As our test result and network traffic analysis, we found after deploying docker swarm mixed with Windows node and Linux node, we publish IIS service with 808080, when we visit the site with Windows nodes on port 8080, it will transfer to IIS container port 80, however, when we visit the site with Linux node on port 8080, it didn’t translate to the real IIS container port 80.
Suggestion
After discussing and according to the log analysis result, it seems there are some compatibility issues on the Linux node to translate the published port to the real container port. Since it may related with Linux system compatibility with docker swarm ingress network, it’s recommended to contact Linux vendor to look into the issue deeply.
Btw is this related to moby/moby#38484 ?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
comment.
Stale issues will be closed after an additional 30d of inactivity.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so.
Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale
Closed issues are locked after 30 days of inactivity. This helps our team focus on active issues.
If you have found a problem that seems similar to this, please open a new issue.
Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle locked
Howdy!
I have a two node hybrid swarm, with one Ubuntu Linux 18.04 node acting as manager and one Windows 2019 node participating as a worker. We have a sample environment we are testing which are entirely Windows containers. We are able to reach the published port for these services from the Windows node (ex. 10.20.1.121:8888) but we are unable to from the Linux node (ex. 10.20.1.122:8888). Our understanding of the ingress mesh networking is that this should work fairly seamlessly as it will route traffic within the swarm to the correct node.
This issue is also occurring with the swarmpit UI stack we have deployed to the swarm. The UI (10.20.1.122:888) is not able to be reached from the Windows node IP (10.20.1.121:888).
Expected behavior
Services running on one node should be accessible from published port on another node
Actual behavior
Published Windows container services are not accessible from linux node and vice versa
Information
Published services and ports:
Both the environment we have pulled from a private repo and the swarmpit stack exhibit this behavior. I am unable to reach the swarmpit UI from port 888 on the Windows node IP.
Steps to reproduce the behavior
Here is the stack with our private repo removed:
teststack.zip
Let me know if I can provide any other info.