Closed riemers closed 8 years ago
Hi @riemers!
Could you please share the Ansible inventory you're using to provision the failing playbook? There should be something strange happening with group definitions. ;)
Thank you!
[production] 1627b 1626i 1629j
[docker_engine:children] production
[docker_swarm_manager] 1627b
[docker_swarm_worker] 1626i 1629j
But here I see a 3 nodes cluster. Is this one the failing example?
yes, it was with 2 nodes. It failed, then i added the 3e node in later. Perhaps that might be the culprit.
Ok. Then, I'll run my tests with 1 master and 1 workers: it smells like a bug! ;)
Lets hope, normally i can read/fix errors too. But this one has me pulling my hair, could be just something stupid on my end. All my other setups worked just fine.
Hey @riemers, which version of the role are you using?
Using v1.1.1
with the following inventory
[docker_engine]
ansible-dockerswarm-01
ansible-dockerswarm-02
[docker_swarm_manager]
ansible-dockerswarm-01
[docker_swarm_worker]
ansible-dockerswarm-02
and this playbook
- hosts: all
roles:
- { role: ansible-dockerswarm }
The provisioning succeed
PLAY RECAP *********************************************************************
ansible-dockerswarm-01: ok=12 changed=5 unreachable=0 failed=0
ansible-dockerswarm-02: ok=9 changed=4 unreachable=0 failed=0
The role that comes from galaxy is 1.1.0
Ops! Here it is the problem: version v1.1.1
is not aligned with master
!
Could you please try with the now released v1.1.1
on galaxy?
Sorry for the issue. :(
woops, clicked the wrong button. But the issue was still there sadly.
TASK [atosatto.docker-swarm : Init "Swarm Mode" on the first manager.] *********
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:8
skipping: [1626i] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
skipping: [1629j] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
skipping: [1627b] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
TASK [atosatto.docker-swarm : Get the worker join-token.] **********************
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:15
skipping: [1626i] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
TASK [atosatto.docker-swarm : Get the manager join-token.] *********************
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:22
skipping: [1626i] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
TASK [atosatto.docker-swarm : Export the address of the first Swarm manager as a fact.] ***
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:29
skipping: [1626i] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
TASK [atosatto.docker-swarm : Join the pending Swarm worker nodes.] ************
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:35
fatal: [1626i]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'stdout'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Join the pending Swarm worker nodes.\n ^ here\n"}
fatal: [1629j]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'stdout'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Join the pending Swarm worker nodes.\n ^ here\n"}
skipping: [1627b] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
Does this help anything? above output? I checked to see if all docker info do actually give back "Swarm:" and they do. Same for the token command, it does give something back.
Could you please confirm me that this is the inventory you're running your playbook against?
[production]
1627b
1626i
1629j
[docker_engine:children]
production
[docker_swarm_manager]
1627b
[docker_swarm_worker]
1626i
1629j
Because looking at your logs, it looks like that the problem could be related to my usage
of run_once: true
in combination with when: inventory_hostname == groups['docker_swarm_manager'][0]
. I'll try to look for a workaround for this.
I can confirm, it is the same list. I removed the "run once" so it would run again as test. But still fails with same error. Output of the run once was:
ok: [1627b] => {"changed": false, "cmd": "docker swarm join-token -q worker", "delta": "0:00:00.028299", "end": "2016-09-09 12:45:10.549948", "invocation": {"module_args": {"_raw_params": "docker swarm join-token -q worker", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "command"}, "rc": 0, "start": "2016-09-09 12:45:10.521649", "stderr": "", "stdout": "SWMTKN-1-0xldxl8pzrh6wyxchmnqjes78sey63lvdg239u8cukly80dcxf-a018pxahxc4mvb0vdbbim1plh", "stdout_lines": ["SWMTKN-1-0xldxl8pzrh6wyxchmnqjes78sey63lvdg239u8cukly80dcxf-a018pxahxc4mvb0vdbbim1plh"], "warnings": []}
Yes. The run_one
is required to make the docker_worker_token
variable accessible by all the hosts. I'll code a workaround for this and then ping you.
Thanks for helping me figuring out the root cause of this issue. 😄
Just to confirm my above message, I removed the "run once" from your role so it would at least set the value. But still after that it fails. If you understand what is going wrong thats fine, but i don't have a clear picture in my head. I'll await your fix then 👍
Hey @riemers, I have commited a fix to the issue you reported to branch issue-2
.
Before actually merging this into master
and making a new release, I would like, if possible, to make you test it. Thx! :)
I took the code from your branch, was only that swarm yml file, then reran it. Below is the output:
fatal: [1626i]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'docker_swarm_addr'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 32, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Export the address of the first Swarm manager as a fact.\n ^ here\n"}
fatal: [1629j]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'docker_swarm_addr'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 32, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Export the address of the first Swarm manager as a fact.\n ^ here\n"}
fatal: [1627b]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'docker_swarm_addr'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 32, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Export the address of the first Swarm manager as a fact.\n ^ here\n"}
It looks like an issue with the way the docker_manager_address
fact is exported.
Could you please try again with the swarm_cluster.yml
file in 55cac85?
Not in my work environment anymore, i can check back Monday morning with you. Thanks for the support so far.
This time I'm quite confident it will work! ;) Anyway let's wait until Monday morning!
Andrea Tosatto
On Sep 9, 2016 21:14, "Erik-jan Riemers" notifications@github.com wrote:
Not in my work environment anymore, i can check back Monday morning with you. Thanks for the support so far.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/atosatto/ansible-dockerswarm/issues/2#issuecomment-246011364, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1LhgVNEwRsSt1zaWGSLNQvknCCHDlQks5qobAxgaJpZM4J4EPv .
That did the trick, it worked fine now! 👍
Very good! 🎉
I've just released v1.2.0
including the fix to the issue you reported.
I think you now can safely update the reference to the role in the requirements.yml
file and do a forced update.
Thank you!
I tried to install this setup with only 1 node and 1 manager (since i didn't have my nr3 server) and now it fails on this:
Tried removing docker on the nodes and doing it again, but same issue. I have another group of 3 servers which ran the playbook just fine, except for this group. Is there anything i am overlooking? Some kind of fact cache that needs to clear or some settings on the server?