atosatto / ansible-dockerswarm

Docker Engine clustering using "Swarm Mode" and Ansible
https://galaxy.ansible.com/atosatto/docker-swarm/
MIT License
262 stars 149 forks source link

dict object error? #2

Closed riemers closed 8 years ago

riemers commented 8 years ago

I tried to install this setup with only 1 node and 1 manager (since i didn't have my nr3 server) and now it fails on this:

TASK [atosatto.docker-swarm : Join the Swarm nodes.] ***************************
fatal: [xxxxxxx]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'stdout'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 28, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Join the Swarm nodes.\n  ^ here\n"}

Tried removing docker on the nodes and doing it again, but same issue. I have another group of 3 servers which ran the playbook just fine, except for this group. Is there anything i am overlooking? Some kind of fact cache that needs to clear or some settings on the server?

atosatto commented 8 years ago

Hi @riemers!

Could you please share the Ansible inventory you're using to provision the failing playbook? There should be something strange happening with group definitions. ;)

Thank you!

riemers commented 8 years ago

[production] 1627b 1626i 1629j

[docker_engine:children] production

[docker_swarm_manager] 1627b

[docker_swarm_worker] 1626i 1629j

atosatto commented 8 years ago

But here I see a 3 nodes cluster. Is this one the failing example?

riemers commented 8 years ago

yes, it was with 2 nodes. It failed, then i added the 3e node in later. Perhaps that might be the culprit.

atosatto commented 8 years ago

Ok. Then, I'll run my tests with 1 master and 1 workers: it smells like a bug! ;)

riemers commented 8 years ago

Lets hope, normally i can read/fix errors too. But this one has me pulling my hair, could be just something stupid on my end. All my other setups worked just fine.

atosatto commented 8 years ago

Hey @riemers, which version of the role are you using?

Using v1.1.1 with the following inventory

[docker_engine]
ansible-dockerswarm-01
ansible-dockerswarm-02

[docker_swarm_manager]
ansible-dockerswarm-01

[docker_swarm_worker]
ansible-dockerswarm-02

and this playbook

- hosts: all
  roles:
    - { role: ansible-dockerswarm }

The provisioning succeed

PLAY RECAP *********************************************************************
ansible-dockerswarm-01: ok=12   changed=5    unreachable=0    failed=0
ansible-dockerswarm-02: ok=9    changed=4    unreachable=0    failed=0
riemers commented 8 years ago

The role that comes from galaxy is 1.1.0

atosatto commented 8 years ago

Ops! Here it is the problem: version v1.1.1 is not aligned with master! Could you please try with the now released v1.1.1 on galaxy?

Sorry for the issue. :(

riemers commented 8 years ago

woops, clicked the wrong button. But the issue was still there sadly.

TASK [atosatto.docker-swarm : Init "Swarm Mode" on the first manager.] *********
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:8
skipping: [1626i] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
skipping: [1629j] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
skipping: [1627b] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [atosatto.docker-swarm : Get the worker join-token.] **********************
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:15
skipping: [1626i] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [atosatto.docker-swarm : Get the manager join-token.] *********************
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:22
skipping: [1626i] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [atosatto.docker-swarm : Export the address of the first Swarm manager as a fact.] ***
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:29
skipping: [1626i] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [atosatto.docker-swarm : Join the pending Swarm worker nodes.] ************
task path: /Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml:35
fatal: [1626i]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'stdout'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Join the pending Swarm worker nodes.\n  ^ here\n"}
fatal: [1629j]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'stdout'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Join the pending Swarm worker nodes.\n  ^ here\n"}
skipping: [1627b] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

Does this help anything? above output? I checked to see if all docker info do actually give back "Swarm:" and they do. Same for the token command, it does give something back.

atosatto commented 8 years ago

Could you please confirm me that this is the inventory you're running your playbook against?

[production]
1627b
1626i
1629j

[docker_engine:children]
production

[docker_swarm_manager]
1627b

[docker_swarm_worker]
1626i
1629j

Because looking at your logs, it looks like that the problem could be related to my usage of run_once: true in combination with when: inventory_hostname == groups['docker_swarm_manager'][0]. I'll try to look for a workaround for this.

riemers commented 8 years ago

I can confirm, it is the same list. I removed the "run once" so it would run again as test. But still fails with same error. Output of the run once was:

ok: [1627b] => {"changed": false, "cmd": "docker swarm join-token -q worker", "delta": "0:00:00.028299", "end": "2016-09-09 12:45:10.549948", "invocation": {"module_args": {"_raw_params": "docker swarm join-token -q worker", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "command"}, "rc": 0, "start": "2016-09-09 12:45:10.521649", "stderr": "", "stdout": "SWMTKN-1-0xldxl8pzrh6wyxchmnqjes78sey63lvdg239u8cukly80dcxf-a018pxahxc4mvb0vdbbim1plh", "stdout_lines": ["SWMTKN-1-0xldxl8pzrh6wyxchmnqjes78sey63lvdg239u8cukly80dcxf-a018pxahxc4mvb0vdbbim1plh"], "warnings": []}
atosatto commented 8 years ago

Yes. The run_one is required to make the docker_worker_token variable accessible by all the hosts. I'll code a workaround for this and then ping you.

Thanks for helping me figuring out the root cause of this issue. 😄

riemers commented 8 years ago

Just to confirm my above message, I removed the "run once" from your role so it would at least set the value. But still after that it fails. If you understand what is going wrong thats fine, but i don't have a clear picture in my head. I'll await your fix then 👍

atosatto commented 8 years ago

Hey @riemers, I have commited a fix to the issue you reported to branch issue-2.

Before actually merging this into master and making a new release, I would like, if possible, to make you test it. Thx! :)

riemers commented 8 years ago

I took the code from your branch, was only that swarm yml file, then reran it. Below is the output:

fatal: [1626i]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'docker_swarm_addr'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 32, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Export the address of the first Swarm manager as a fact.\n  ^ here\n"}
fatal: [1629j]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'docker_swarm_addr'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 32, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Export the address of the first Swarm manager as a fact.\n  ^ here\n"}
fatal: [1627b]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'docker_swarm_addr'\n\nThe error appears to have been in '/Users/riemers/roles/atosatto.docker-swarm/tasks/swarm_cluster.yml': line 32, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Export the address of the first Swarm manager as a fact.\n  ^ here\n"}
atosatto commented 8 years ago

It looks like an issue with the way the docker_manager_address fact is exported. Could you please try again with the swarm_cluster.yml file in 55cac85?

riemers commented 8 years ago

Not in my work environment anymore, i can check back Monday morning with you. Thanks for the support so far.

atosatto commented 8 years ago

This time I'm quite confident it will work! ;) Anyway let's wait until Monday morning!

Thank you for helping me to with the debug.

Andrea Tosatto

On Sep 9, 2016 21:14, "Erik-jan Riemers" notifications@github.com wrote:

Not in my work environment anymore, i can check back Monday morning with you. Thanks for the support so far.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/atosatto/ansible-dockerswarm/issues/2#issuecomment-246011364, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1LhgVNEwRsSt1zaWGSLNQvknCCHDlQks5qobAxgaJpZM4J4EPv .

riemers commented 8 years ago

That did the trick, it worked fine now! 👍

atosatto commented 8 years ago

Very good! 🎉

I've just released v1.2.0including the fix to the issue you reported. I think you now can safely update the reference to the role in the requirements.yml file and do a forced update.

Thank you!