docker / for-win

Bug reports for Docker Desktop for Windows
https://www.docker.com/products/docker#/windows
1.86k stars 290 forks source link

Stack Service Containers Can't Communicate Across Nodes #1476

Open Vacant0mens opened 6 years ago

Vacant0mens commented 6 years ago

Expected behavior

Services in the same Stack/overlay network should be able to talk to each other.

Actual behavior

Only replicas that are hosted on the same node can talk to each other.

Information

Windows Server 2016, 1709 build, latest updates (not Insider Preview)

output of docker version:

Client:
 Version:      17.06.2-ee-6
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   e75fdb8
 Built:        Mon Nov 27 22:46:09 2017
 OS/Arch:      windows/amd64

Server:
 Version:      17.06.2-ee-6
 API version:  1.30 (minimum version 1.24)
 Go version:   go1.8.3
 Git commit:   e75fdb8
 Built:        Mon Nov 27 22:55:16 2017
 OS/Arch:      windows/amd64
 Experimental: false

netstat -ano shows that port 4789/udp is not open or listening (it doesn't come up in the list at all).

Based on this issue in docker/swarm, it doesn't seem to be a windows-only problem.

Steps to reproduce the behavior

  1. Deploy stack with two or more services
  2. docker exec into one of the started containers
  3. ping another service via dnsrr, or by IP address.
Vacant0mens commented 6 years ago

ping @jasonbivins

docker-robott commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

olljanat commented 6 years ago

We have seen this same issue with Win Srv 2016, version 1709 and even with latest 1803 version. Docker version 17.06.2-ee-10 or 18.03.1-ce does not make difference.

We also found that on Linux + Windows hybrid swarm:

I can also see that Linux nodes are listening VXLAN port 4789/udp but Windows nodes are not.

@thaJeztah I know that you are syncing items between github projects so do you know if there is another item open of this issue?

It would be also nice to hear if @StefanScherer have seen this issue? (I know that he is one of the pioneers as using Linux + Windows hybrid swarm).

olljanat commented 6 years ago

@kallie-b ping this is most probably issue on Windows overlay implementation.

kallie-b commented 6 years ago

@daschott @jmesser

(I've actually moved off of this project, unfortunately! But David or Jason should be able to help out).

olljanat commented 6 years ago

Ok. Thanks anyway for connecting this issue to rights people.

I can also tell that I just tested to create new Docker Swarm where I have Ubuntu 16.04 as manager and Windows Server, version 1803 as worker and both of them are running same 17.06.2-ee-10 version of docker.

Then I created test services using these commands

docker network create --driver overlay test
docker service create --name linux --network test --constraint node.platform.os==linux nginx
docker service create --name win --network test --constraint node.platform.os==windows microsoft/iis:nanoserver

And here you can see that ping from container on Linux to container on Windows works:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS               NAMES
ea7e22a00d84        nginx:latest        "nginx -g 'daemon ..."   About a minute ago   Up About a minute   80/tcp              linux.1.583ua7658uxz8msc7l3uiof5w

$ docker exec ea7e22a00d84 ping -c 4 win
PING win (10.0.0.4) 56(84) bytes of data.
64 bytes from 10.0.0.4 (10.0.0.4): icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 10.0.0.4 (10.0.0.4): icmp_seq=2 ttl=64 time=0.047 ms
64 bytes from 10.0.0.4 (10.0.0.4): icmp_seq=3 ttl=64 time=0.048 ms
64 bytes from 10.0.0.4 (10.0.0.4): icmp_seq=4 ttl=64 time=0.068 ms

--- win ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.037/0.050/0.068/0.011 ms

but ping from container on Windows to container on Linux fails:

C:\>docker ps
CONTAINER ID        IMAGE                      COMMAND                   CREATED             STATUS              PORTS               NAMES
b531d3fd223b        microsoft/iis:nanoserver   "C:\\ServiceMonitor..."   2 minutes ago       Up 2 minutes        80/tcp              win.1.nc0hvqtzgrs3bpkpkqeohw65y

C:\>docker exec b531d3fd223b ping linux

Pinging linux [10.0.0.2] with 32 bytes of data:
Reply from 10.0.0.5: Destination host unreachable.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 10.0.0.2:
    Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),
olljanat commented 6 years ago

@Vacant0mens FYI. I found workaround to this issue. Even it have said on this blog post that routing mesh is supported on Windows Server, version 1709 and above you still must use DNS routing endpoint mode for all services ( --endpoint-mode dnsrr ) where you want connect from Windows containers.

So fixed version of my commands above are:

docker network create --driver overlay test
docker service create --name linux --network test --endpoint-mode dnsrr --constraint node.platform.os==linux nginx
docker service create --name win --network test --endpoint-mode dnsrr --constraint node.platform.os==windows microsoft/iis:nanoserver

Very good documentation about this stuff is available on here: https://sreeninet.wordpress.com/2016/07/29/service-discovery-and-load-balancing-internals-in-docker-1-12/

Vacant0mens commented 6 years ago

@olljanat, I'm not sure how different it is, but I was doing something that looks very similar with a Stack and a Compose file, except that I was using only Windows. That's where this problem came up. I made my own Nginx image for Windows and it was unable to communicate to the other services running on the other hosts in the swarm. It could connect to services where the containers ran on the same host, but not the others. I would try pinging your Linux service/container from the iis container and see what happens.

olljanat commented 6 years ago

@Vacant0mens for me it sounds like same problem. I tested some point add multiple Windows nodes to swarm and services running on them was not able to communicate with services on other nodes.

And just FYI, I actually spent whole this week with this issue because it is very critical for our production use cases and tested multiple combinations of versions, configs, etc and this settings was only one which had any effect to this one.

thaJeztah commented 6 years ago

/cc @carlfischer1 @johnstep

scmikes commented 6 years ago

Note: I am now seeing this issue on a single node swarm, windows 2016 server, docker EE

olljanat commented 6 years ago

@scmikes that is interesting. Did you try if endpoint_mode workaround above helps?

scmikes commented 6 years ago

Same failr with end_point mode

very simple yml file, contributed by some else who verified at their site


version: '3.3'

networks:
  my-net:
    driver: overlay
    attachable: true

services:

  app1:
    image:  hello-world
    networks:
      - my-net
    deploy:
      endpoint_mode: dnsrr 
    command: powershell -command Start-Sleep 86400
  app2:
    image:  hello-world
    networks:
      - my-net
    deploy:
      endpoint_mode: dnsrr 
    command: powershell -command Start-Sleep 86400  
  1. save yml as docker-compose.yml
  2. docker swarm init ...
  3. docker stack deploy -c docker-compose.yml simple
  4. docker ps to get continer name
  5. docker exec -it powershell
  6. ping other service

PS C:\> ping simple_app2
Ping request could not find host simple_app2. Please check the name and try again.
PS C:\> ping simple_app1
Ping request could not find host simple_app1. Please check the name and try again.

Note: same results with or without deploy : end_point tag

Verified on

windows 10, build 1803, docker stable and edge windows 2016, windows ee

olljanat commented 6 years ago

@scmikes I can confirm that this issue happens also on my Win 10, buid 1803 with docker-ce stable.

and looks that it is not just stack issue but same thing happens if I manually create services for these

docker network create --driver overlay my-net
docker service create --name app1 --network my-net --endpoint-mode dnsrr hello-world powershell -command Start-Sleep 86400
docker service create --name app2 --network my-net --endpoint-mode dnsrr hello-world powershell -command Start-Sleep 86400

So to that issue we don't even have workaround (or maybe workaround would be to have two Windows machines on swarm, locate these services to different nodes and use --endpoint-mode dnsrr but I have not tested that).

scmikes commented 6 years ago

@olljanat Thank you for verifying this. You are the 3/3 that have verified this.

We are working to support windows containers as well as Linux. Our Linux containers have been fine for quite some time.

Doesn't this issue indicate that swarm mode, with multiple services per node are non-functional on windows?

Thank you very much, Michael

olljanat commented 6 years ago

Yes, that confirms that there cannot be multiple services per node which need to be able to communicate with each others.

@scmikes if you are planning to run Windows containers with swarm mode on production you probably want to look my issue tracking repository

Vacant0mens commented 6 years ago

@olljanat I have seen many of the problems in your issue tracking show up in Windows-only swarms, some even with the 1709 build. Will there be a new version of Docker EE for Windows any time soon? It seems to be neglected in lieu of all the Docker EE 2.0 hype. Originally it was stated that EE would get updates every 3 months, but we're coming up on a year now without any significant updates. Is 18.03, or 18.06 coming any time soon? My team has had to resort to other means of running our new applications because we couldn't get Docker in Windows working stable enough to even properly test in our Dev environment.

scmikes commented 6 years ago

@olljanat, Thanks for the link and the info. This bug was such a showstopper for us that we were not able to procced with our linux->windows containers port. We are hoping to offer our customers both linux and windows containers.

Thanks again,
Michael Schneider

olljanat commented 6 years ago

@Vacant0mens, notice that Windows Server, version 1803 was published to MSDN/volume licensing site on 7th of May. You definitely should use that instead of old 1709 build as it contains plenty of improvements to performance, etc. Looks that they are also frozen Docker EE features to 17.06 level but still backporting fixes to it quite often ( https://docs.docker.com/ee/engine/release-notes/ ) which is probably even better than do big feature updates as there is plenty of stability issues especially with Windows containers.

Docker EE 2.0 hype comes most probably from UCP which is good tool but quite expensive.

We have been running Windows containers on 2016 over year now on our dev/test environments and they works fine on single server setup but it looks that swarm mode and especially overlay network is which causes issues.

@Vacant0mens / @scmikes if this is critical issue for you and you have possibility then consider to create case to Microsoft Premier Support: https://success.docker.com/article/where-to-get-help-with-windows We are able to connect from Linux to Windows and Windows to Linux now with that --endpoint-mode dnsrr workaround so we probably will no do that.

olljanat commented 6 years ago

Actually it looks that Docker EE on Windows Server, version 1803 works correctly.

I just deployed this stack to swarm where I have Ubuntu 16.04 as manager and Windows Server, version 1803 as worker. Both of them are running docker version 17.06.2-ee-13. And connections between all the containers works just fine :)

version: '3.3'

networks:
  test:
    driver: overlay

services:
  win1:
    image: microsoft/nanoserver:1803_KB4103721
    networks:
      - test
    deploy:
      endpoint_mode: dnsrr
      placement:
        constraints:
          - node.platform.os==windows
    command: ping -t 127.0.0.1
  win2:
    image: microsoft/nanoserver:1803_KB4103721
    networks:
      - test
    deploy:
      endpoint_mode: dnsrr
      placement:
        constraints:
          - node.platform.os==windows
    command: ping -t 127.0.0.1
  linux1:
    image: alpine:3.7
    networks:
      - test
    deploy:
      endpoint_mode: dnsrr
      placement:
        constraints:
          - node.platform.os==linux
    command: sh -c "ping 127.0.0.1"
  linux2:
    image: alpine:3.7
    networks:
      - test
    deploy:
      endpoint_mode: dnsrr
      placement:
        constraints:
          - node.platform.os==linux
    command: sh -c "ping 127.0.0.1"

So probably it is actually just Windows 10 or docker ce where even that endpoint_mode=dnsrr setting does not fix connectivity issues.

scmikes commented 6 years ago

That is a good workaround if you have windows and Linux nodes.

Our goal is to support all windows sites with windows contains, and use Linux containers on sites that have Linux.

If a site has any Linux, would likely deploy Linux containers, since they are more robust.

If a site is windows only, we would like to be able to provide windows containers. This is still a critical workflow.

It is very good to know that windows containers work with a Linux swarm master. Thanks for the very good info.

Still can't mark as a workaround though for key customers with windows only sites. :-( Michael

olljanat commented 6 years ago

@scmikes I mean with that configuration also all Windows containers are able to communicate together so I recommended you to try run that Win Server version with two Win nodes. I assume that it will work.

Also notice that "Only worker nodes are supported on Windows, and all manager nodes in the cluster must run on Linux.". But there should not be anything prevent you to do that.

Vacant0mens commented 6 years ago

@olljanat is there support for windows-only swarms without using UCP? (as in, just running docker in swarm mode and managing each stack/service via command line) It's very bothersome that every time I ask about my situation the response is "Windows is only supported as workers under UCP." rather than something more useful, like talking about windows-only swarms where UCP isn't involved at all. Does no one test that scenario or something?

scmikes commented 6 years ago

@olljanat Sorry I was not clear at explaining our issue.

All of our code is heterogeneous, so it can run in either windows on Linux containers. Also, only a small percentage of our code is containerized at this time. This is the first release with containers.

So we have customers in 3 camps

1) all Linux - easy on, just deploy on Linux, and Linux containers 2) mix of Linux and windows - we would recommend running Linux containers on a Linux host at this time since windows containers are very new 3) windows only MicrosoftDocs/Virtualization-Documentation#1 preference - Linux containers on LCOW , then we only need to validate one type of container MicrosoftDocs/Virtualization-Documentation#2 windows containers on windows hosts

The solution you are proposing is 2), If there are Linux machines in house, then we would recommend deploying Linux containers. It is of no value to us to run Linux master and windows worker nodes.

It is an interesting option, especially if a product has code that must run in windows containers. Your approach gives them a way to stand up a swarm with windows containers. Very nice. It does not add value for us at this time though.

olljanat commented 6 years ago

@Vacant0mens good question but I don't know answer. What I can tell is that we are not using UCP and all on our production Linux servers are using docker 17.09.1-ce instead of enterprise edition.

@scmikes based my experience I recommend to run all applications which can Linux containers on that platform (number 1 on your list) because of much smaller overhead and better support from community.

Only reason why we are using mixed Linux + Windows swarm is that we have plenty of applications which are originally made for Windows platform and which cannot be ported to Linux.

What comes to LCOW and Hyper-V isolation mode is prerequirement for it. Microsoft is made very good work with Windows Server, version 1803 and improved performance a lot. But if you which to Hyper-V isolation mode you will lose all these improvements and you are back on poor performance and huge overhead which you can see with Windows Server 2016. That why we did no-go decision with LCOW. If you don't trust me you can very easily try that and see it yourself.

And just to make my recommendation clear for anyone who probably will read this: If you have possibility to run Linux only containers. DO THAT. Don't waste your time and money with Windows containers. I would definitely do same if that would be option.

Vacant0mens commented 6 years ago

The place I work is a full Windows shop. We only have a minimal amount of linux machines (all of which are some kind of appliance), so running Linux would take a lot of convincing and a lot of planning with a lot of people to get policies, auditing, security, etc. to be able to run Linux servers. While it may happen some day, it's definitely not in the cards at this point.

LCOW is a decent concept in general, and probably useful to some, but last i heard, it was not very stable, not to mention hyper-v isolation increases hardware overhead sevenfold. We may as well be trying automate hyper-v to stand up VM's for each of our microservices in that case.

The best solution for windows shops (like mine) who can't/don't have any linux machines is the obvious one: windows containers on windows. You both are talking like this is some backwater option that no one has ever heard of, but this is the reason that Docker exists on Windows Server. It bothers me that Windows is rarely given the time of day, especially in terms of development. We were forced switched everything over to using a service management program to temporarily run our microservices directly on our servers because we have been running into issue after issue for the last year with Docker and couldn't even get our software into Alpha properly. We'll be running our software this way until we can make sure that docker is stable enough in windows to run our software in production.

olljanat commented 6 years ago

@Vacant0mens ok then I would say that go for it and setup Windows only swarm with latest versions and tell us how it works?

Server 1803 was big step from Microsoft so I assume that it should be quite stable now.

And even Win 10 you can run Windows containers just fine but currently you cannot use swarm mode on there if you need connections between these containers.

rahul24 commented 6 years ago

stuck with same problem where all nodes on window server core 1803. The stack services are not able to communicate with each other.

  1. One service in manager node and other in worker node - Not Working
  2. Both services on manager and worker node - Not working.

Is anyone find any workaround for same?

olljanat commented 6 years ago

@rahul24 last stack file which I posted here should works just fine. Trick is to have "endpoint_mode: dnsrr" setting for all the services.

rahul24 commented 6 years ago

@olljanat But will above config works when both manager and worker on window server core 1803.

olljanat commented 6 years ago

Why didn't you try and see? There is no any reason why it would not work. All managers are also workers so unless you add constraint which prevent services to deployed on managers they will run services same way like all workers.

Managers have just extra responsibility to take care of swarm configuration and schedule tasks to other nodes.

rahul24 commented 6 years ago

@olljanat tried with the above script but same result - not able to communicate.

Docker Compose File:

`version: '3.3'

networks:
  test:
    driver: overlay

services:
  service1:
    image: service1    
    networks:
      - test
    deploy:
      endpoint_mode: dnsrr      
    ports:
     - target: 80
       published: 80
       mode: host    

  service 2:
    image: service 2
    networks:
      - test
    deploy:
      endpoint_mode: dnsrr      
    ports:
     - target: 80
       published: 80
       mode: host    

`

thaJeztah commented 6 years ago

You seem to be publishing two services to port 80; are they both actually up?

rahul24 commented 6 years ago

@thaJeztah Originally used placement constraint - one on manager and another on worker. Even tried by assigning different port on services and placed only on manager node. Nothing works.

olljanat commented 6 years ago

@rahul24 sounds that you have missed some basic step like firewall rules, etc.

If you want to be sure try first create swarm where you have one Linux as manager and Win 1803 as worker and deploy my example stack to there. It was config which I tested and got everything working and then you can try to promote Windows as manager and drop Linux from swarm, etc...

rahul24 commented 6 years ago

@olljanat - Initial setup of my VM, I've added all the ports of docker in firewall exceptions. In fact, am able to setup routing mesh without any problem - Single service only. The problem is when deploying two services then they're not able to communicate with each other. I think, I should try placing manager on linux box. The way you have done it.

lukaszherman commented 6 years ago

I have very similar or the same problem.

version: '3.3'

networks:
  test:
    driver: overlay

services:
  win1:
    image: microsoft/nanoserver:1803_KB4103721
    networks:
      - test
    deploy:
      endpoint_mode: dnsrr
      replicas: 3
    command: ping -t 127.0.0.1

Everything on 1803. Docker 17.06.2-ee-13

I am able to ping from one container to other using IP addresses. I'm not able to reach any container (in the same stack) using any of the names (service name, stack name). I'm able to resolve DNS entries that lives outside swarm (for example ping my MSSQL server).

I'm using Windows Containers, not Hyper-V Containers. Maybe it's the difference.

CassandraWin commented 6 years ago

Hi all, I just want to tell you my experience on docker for windows as I also work in a full windows shop.

As @Vacant0mens already said, it seems that docker for windows is not taken into account for the swarm. We always ear "the manager must be a Linux node ...", but I was able to run an Hybrid swarm (Windows 10 + Linux nodes) for the past months.

I use two nodes, one Windows 10 with docker ce 18.03 and one Linux Ubuntu 16.04 VM. I was able to make it work using the configuration @olljanat proposed with endpoint_mode: dnsrr and mode:global. The communication between the Linux containers and the Windows containers was working fine using their service names.

But as long as my Windows 10 node upgraded to 1803(April 2018 Update) the DNS of the overlay network stopped working. It seems Microsoft broke something in the overlay network. The windows containers can talk to each others but no communication with the linux ones.

For now, I am not able to make it work on this configuration. I have a Staging machine based on Windows server 2016 with docker ce 1803 and it's still working on it.

olljanat commented 6 years ago

@CassandraWin Windows 10 uses Hyper-V isolation mode which is known to be buggy especially with Swarm mode.

That why I recommend to use non-Swarm mode on Win 10 machines on Swarm mode only on Windows Servers.

As most major issue here is that it not clearly documented which scenarios are supported with Windows Containers created issue about it: MicrosoftDocs/Virtualization-Documentation-Private#1531

ohdihe commented 6 years ago

Hi everyone. This is an interesting thread. I am having a slight different issue with compose.yml while declaring both endpoint_mode: dnsrr and ports for services to be deployed on windows server 2016 (2 node) cluster. I get an error saying "can't use dnsrr with publish mode". When I remove the port declaration section including mode=host from the compose.yml file, it works but I get not port to access the services. However, when I use the docker service create Cli command and specify "--endpoint-mode dsnrr --publish mode=host,target=80" it works just fine. I get a port to access the service. Pls is there something I am doing wrong.

Thanks.

rahul24 commented 6 years ago

@ohdihe - I believe, you're using the compose file which I posted above. If yes, then initially I added placement tag to deploy one service in manager and another in the worker. If you don't use the placement tag, then it gets deployed on the same node and try to use same port -80 which will end up in conflict.

ohdihe commented 6 years ago

@rahul24 - Thanks for the quick response. Let me try to explain it better. If I use compose file that has only the endpoint_mode: dnsrr specified on version 3, and I check on the service running the "docker service ps " command, the output shows not port. Also, when I leave remove the service and the edit the compose to include but endpoint_mode: dnsrr and also specify ports (target, publish and mode), I get an error in the likes of "can't use dnsrr with publish mode".

But if I just try creating the service individually using the cli docker service create --name web --endpoint-mode dnsrr --publish mode=host,target=8000 --network my_net --constraint 'node.labels.name=worker' myimage:latest it works well. And I can access service from both manager node and worker node.

My question is, why do I get an error when I specify both ports and endpoint_mode: dnsrr in the compose file and when I sepcify just endpoint_mode: dnsrr in the compose file, I get no ports to use to access the service and when I specify only ports section, I still can't access the service?

What version of the compose file are you using?

olljanat commented 6 years ago

@ohdihe based on documention you need use at least version 3.3

ohdihe commented 6 years ago

@olljanat. Thank you very much. I was able to resolve it. The ports now show up and I can access service via the host:container port configuration.

csgodd17 commented 6 years ago

Hi everyone,

I'm also seeing some unusual behavior when trying to use host mode and port mappings with windows containers running as a stack. I'm using docker EE 17.06.2 and when I run docker service create --publish mode=host,target=443,published=30000 the aspnet application runs fine and is accessible on when I curl the host machine on the published port.

However, when I try and run docker stack deploy with a compose file like:

version: '3.3'
services:
  api:
    image: apiimage
    ports:
      - target: 443
        published: 30000
        mode: host

The service runs fine, but I cant successfully curl the host on port 30000 to access the service. When I run netstat on the host, the port is not listed. I've tried including endpoint_mode: dnsrr which worked for @ohdihe but no success.

olljanat commented 6 years ago

@csgodd17 try include network definition to your stack file.

csgodd17 commented 6 years ago

@olljanat I just gave that a shot as well, but also no success. Same thing, try to curl the endpoint and it just hangs. No idea why this work work in a service configuration but not as a stack

ohdihe commented 6 years ago

@csgodd17 Couple of things. What is your cluster configuration? If it is just a two node cluster and you have services already running on the worker using that same port configuration, it could mean port collusion is causing your problem. Try to remove the services that you have running and retry the stack deployment or give your stack a new name and change the port mappings.

Also, if you don't mind, please show the full .yml file.

csgodd17 commented 6 years ago

@ohdihe Thanks for your response! I'm curious as to what version of docker you were running when you were able to get host mode running and ports exposed for windows containers?

I am running a 5 master 5 node cluster with centos masters and windows workers. I am running 17.06.2-ee-13 on the windows workers and 17.06.2.ce-1 on the masters and using the 3.3 version of docker-compose. I have tried running this as a global service, but usually I have been testing running only one replica. I've been double checking for port collusion each time I deploy the stack and I'm almost certain that isn't the problem here.

In desperation I created a single master single worker cluster and upgraded them to 18.03.1, but I have the same issue still. I can definitely run the service correctly with the following service create command:

docker service create --name test --publish mode=host,target=443,published=22000 --replicas=1 --constraint "node.platform.os == windows" microsoft/aspnet:4.7.2

However, the docker-compose file equivalent of this is not working:

version: '3.3'
services:
  api:
    image: microsoft/aspnet:4.7.2
    ports:
      - target: 443
        published: 30000
        mode: host
    networks:
      - test
    deploy:
      endpoint_mode: dnsrr
      constraints:
        - node.role != manager
        - node.platform.os == windows

networks:
  test:
    driver: overlay

I have also tried running that without specifying a network or the endpoint_mode. I don't really see how either of those settings would effect host mode... based on my understanding of the documentation all that should be needed is what is in the ports section 🤷‍♀️

When I run telnet it responds that no connection can be established and when I run netstat on the host it doesn't list the port im trying to expose for the service. the output of docker service ps however lists the port mapping from *:30000->443/tcp for the service.

I have also tried using a thousand different port numbers just in case something in windows has them auto reserved.

ohdihe commented 6 years ago

@csgodd17 I am using 18.06.0-ce and from your compose file, I am not sure if the Constraints is right. It should be; image: postgres

deploy:
      endpoint_mode: dnsrr
      placement:
        constraints: [node.labels.name == worker]

Constraints should be the node(s) you want the services to run on only. Docker is smart enough so no need for "!="

It might be that the service is not running on the node that you intend it to run on. You can check this buying RDP into the windows worker and run docker ps command to validate if there is any running container on that node for the service.

I really don't know what I did to have mine working, I just changed the compose version to '3.3', then reset docker on the manager just because and surprisingly it started working.