Closed nvrnight closed 6 years ago
I have started up other containers and am able to ping swarm-listener from them so I believe the internal DNS is working correctly. Is it possible the dfp service is trying to grab the certs using that url before the DNS configuration has been applied to the container?
I try pinging the dfp service from my container I used to ping swarm-listener and it says unknown host also which also makes me suspect that the container is not far enough along in its startup process to be in the swarm networks' DNS registration.
After further testing, the dfp service started up fine by using -e SERVICE_NAME=127.0.0.1 so that may add a bit more truth to my theory above.
What is the status of the DFP service replicas? If it fails because the container is not far enough along in its startup process to be in the swarm networks' DNS registration, Swarm should kill the container and start a new instance and, by that time, DFSL should be running. Please list the processes of the DFP service (e.g., docker service ps dfp
).
DFSL is running well before DFP tries to start up. It seems to be the problem that DFP is trying to contact itself(by its DNS name) for the certs and isn't able to cause DFP's DNS hasn't propagated to the swarm yet, which is why 127.0.0.1 works fine cause no need to contact DNS to access its own local ip.
Every time it fails to start and then tries to start a new instance of it because the previous one failed.
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
yixjv5vebl1m dfp.1 dockerflow/docker-flow-proxy:latest user123-VirtualBox Ready Ready 1 second ago
irdq9qc5duy6 _ dfp.1 dockerflow/docker-flow-proxy:latest user123-VirtualBox Shutdown Failed 2 seconds ago "task: non-zero exit (143): do…"
wv400bc1aqwa _ dfp.1 dockerflow/docker-flow-proxy:latest user123-VirtualBox Shutdown Failed 25 seconds ago "task: non-zero exit (143): do…"
xldnamcxlfcq _ dfp.1 dockerflow/docker-flow-proxy:latest user123-VirtualBox Shutdown Failed 49 seconds ago "task: non-zero exit (143): do…"
Can you please upgrade to the tag dockerflow/docker-flow-proxy: 18.04.30-41
and add the env. var. DNS_LOOKUP_PAUSE_MS=1000
. That will add a pause of 1000 milliseconds (or whichever value you put) before the lookup is performed.
This is not a "real" fix but more of a test to see whether the issue is really due to DNS propagation being slow.
I tried what you said and it worked after a few tries. Here's the service logs
Command
sudo docker service create --name dfp -p 80:80 -p 443:443 --network proxy -e SERVICE_NAME=dfp -e DNS_LOOKUP_PAUSE_MS=1000 -e LISTENER_ADDRESS=swarm-listener dockerflow/docker-flow-proxy:18.04.30-41
Results
dfp.1.v21q2o15kfl2@user123-VirtualBox | 2018/05/01 01:34:18 Starting HAProxy dfp.1.xre52aiggc7l@user123-VirtualBox | 2018/05/01 01:33:30 Starting HAProxy dfp.1.v21q2o15kfl2@user123-VirtualBox | 2018/05/01 01:34:19 Getting certs from http://198.105.254.24:8080/v1/docker-flow-proxy/certs dfp.1.wwon83wvmow9@user123-VirtualBox | 2018/05/01 01:33:54 Starting HAProxy dfp.1.xre52aiggc7l@user123-VirtualBox | 2018/05/01 01:33:32 Getting certs from http://198.105.254.24:8080/v1/docker-flow-proxy/certs dfp.1.rkphvbyuwe1j@user123-VirtualBox | 2018/05/01 01:34:42 Starting HAProxy dfp.1.rkphvbyuwe1j@user123-VirtualBox | 2018/05/01 01:34:43 Starting "Docker Flow: Proxy" dfp.1.wwon83wvmow9@user123-VirtualBox | 2018/05/01 01:33:55 Getting certs from http://198.105.254.24:8080/v1/docker-flow-proxy/certs dfp.1.rkphvbyuwe1j@user123-VirtualBox | 2018/05/01 01:34:48 Got configuration from http://swarm-listener:8080.
I removed it and tried it again and was unsuccessful this time, even tried increasing the delay on that environment variable..
I had let it run over night and eventually it succeeded.
dfp.1.zvjlruf6i3c5@user123-VirtualBox | 2018/05/02 05:55:13 Starting HAProxy dfp.1.zv3ozir6wc13@user123-VirtualBox | 2018/05/02 02:10:53 Starting HAProxy dfp.1.zyhaya3e1t0d@user123-VirtualBox | 2018/05/02 04:25:43 Starting HAProxy dfp.1.zyhaya3e1t0d@user123-VirtualBox | 2018/05/02 04:25:44 Getting certs from http://198.105.254.24:8080/v1/docker-flow-proxy/certs dfp.1.zvjlruf6i3c5@user123-VirtualBox | 2018/05/02 05:55:14 Getting certs from http://198.105.254.24:8080/v1/docker-flow-proxy/certs dfp.1.zv3ozir6wc13@user123-VirtualBox | 2018/05/02 02:10:54 Getting certs from http://198.105.254.24:8080/v1/docker-flow-proxy/certs dfp.1.zxrxb82r2ye5@user123-VirtualBox | 2018/05/02 02:39:38 Starting HAProxy dfp.1.zxrxb82r2ye5@user123-VirtualBox | 2018/05/02 02:39:40 Getting certs from http://198.105.254.24:8080/v1/docker-flow-proxy/certs dfp.1.lbv090hyqjlu@user123-VirtualBox | 2018/05/02 06:06:36 Starting HAProxy dfp.1.lbv090hyqjlu@user123-VirtualBox | 2018/05/02 06:06:37 Getting certs from http://198.105.254.24:8080/v1/docker-flow-proxy/certs dfp.1.lbv090hyqjlu@user123-VirtualBox | 2018/05/02 06:06:37 Getting certs from http://198.105.244.24:8080/v1/docker-flow-proxy/certs dfp.1.lbv090hyqjlu@user123-VirtualBox | 2018/05/02 06:06:37 Found 0 certs dfp.1.lbv090hyqjlu@user123-VirtualBox | 2018/05/02 06:06:37 Starting "Docker Flow: Proxy" dfp.1.lbv090hyqjlu@user123-VirtualBox | 2018/05/02 06:06:42 Got configuration from http://swarm-listener:8080.
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS lbv090hyqjlu dfp.1 dockerflow/docker-flow-proxy:18.04.30-41 user123-VirtualBox Running Running 7 hours ago
zvjlruf6i3c5 _ dfp.1 dockerflow/docker-flow-proxy:18.04.30-41 user123-VirtualBox Shutdown Failed 7 hours ago "task: non-zero exit (143): do…"
zyhaya3e1t0d _ dfp.1 dockerflow/docker-flow-proxy:18.04.30-41 user123-VirtualBox Shutdown Failed 9 hours ago "task: non-zero exit (143): do…"
zxrxb82r2ye5 _ dfp.1 dockerflow/docker-flow-proxy:18.04.30-41 user123-VirtualBox Shutdown Failed 11 hours ago "task: non-zero exit (143): do…"
zv3ozir6wc13 _ dfp.1 dockerflow/docker-flow-proxy:18.04.30-41 user123-VirtualBox Shutdown Failed 11 hours ago "task: non-zero exit (143): do…"
It seems that the problem is in latency. It takes a lot of time until DNS info is propagated across your cluster or there's a networking problem. In either case, I'm not sure that's something we can fix in DFP.
In this instance of the testing I was actually only running my master node to take those kinds of things out of the equation. There shouldn't have been any networking issues aside from the local networking in docker itself.
Not sure if you've made some changes, but I updated my copy of the repo on master and built the container and it loads every time now consistently. If I switch back to the container on dockerflow/docker-flow-proxy is fails.
user123@user123-VirtualBox:~/projects/docker-flow-proxy$ sudo docker service create --name dfp -p 80:80 -p 443:443 --network proxy -e SERVICE_NAME=dfp -e LISTENER_ADDRESS=swarm-listener docker-flow-proxy image docker-flow-proxy:latest could not be accessed on a registry to record its digest. Each node will access docker-flow-proxy:latest independently, possibly leading to different nodes running different versions of the image.
l82lynr5vojojn98t0w0txswa overall progress: 1 out of 1 tasks 1/1: running [==================================================>] verify: Service converged
It all seems to be working correctly now using dockerflow/docker-flow-proxy.
I'm guessing you've performed the needed magic for it to work. Thanks for the help.
I'm using VirtualBox with 3 Linux/Ubuntu 17.10.1 VMs. 1 manager, 2 workers. I read your responses in the other issues about it being something with the overlay network, but I'm at a loss at what to do or how to troubleshoot that. I just started experimenting with docker yesterday.
docker service logs dfp
Commands run: