Open markdorison opened 7 years ago
@markdorison first of all thanks for submitting a ticket!
Second, as you're getting the following: Permission denied (publickey)
, could you please check the ssh-agent to ensure keys have been added?
You can use the following
echo $SSH_AGENT_PID
eval $(ssh-add)
ssh-add
ssh-add -l
This will rule out any obvious errors. I am intending on going back through this role again for a new release - much like what I've just completed for my curl role.
@fubarhouse: Thanks!
(I work with @markdorison and it was me who ran into this issue). The machine the Ansible playbook was running from does have a passphrase-protected ssh-key to the remote machine, but we know it was ssh-add
ed because the playbook was able to connect to the remote machine at all (and successfully complete all tasks up to the golang role).
But reading through the code, it looks like it's the ansible_ssh_user is undefined
that's even forcing the role's use of synchronize
instead of shell
, and this playbook is usually run without -u
, and I don't think we'd found the fubarhouse_user
variable.
I'll run the playbook again at a quiet time with a value for user
or fubarhouse_user
to see if that's the issue & report back.
@ctorgalson that actually describes something I can diagnose much better, so I'll look into this for you and respond.
Under no circumstance should fubarhouse_user
not be assigned a value, but at least now I know there is a case where it may not be.
Are you able to provide any given reason the following two tasks would be skipped?
- name: "Go-Lang | Define user variable for ssh use"
set_fact:
fubarhouse_user: "{{ ansible_ssh_user }}"
when: ansible_ssh_user is defined and fubarhouse_user is not defined
- name: "Go-Lang | Define user variable for non-ssh use"
set_fact:
fubarhouse_user: "{{ ansible_user_id }}"
when: ansible_ssh_user is not defined and fubarhouse_user is not defined
@fubarhouse Thanks. I think I can explain what's failing.
ansible -i hosts.yml -m setup hostname
shows me that:
ansible_ssh_user
nor ansible_user
(since this is Ansible 2.2.2.0) are defined, andansible_user_id
to ctorgalson
.Since ansible_ssh_user
is undefined, the role sets fubarhouse_user
to the value of ansible_user_id
.
Since ansible_ssh_user
is undefined, the role will attempt to use the synchronize
task
The synchronize task uses the value of fubarhouse_user
for become_user
. This is the direct cause of the error we see (but possibly not the actual problem):
"cmd": "/usr/bin/rsync --delay-updates -F --compress --delete-after --archive --rsh 'ssh -S none -o StrictHostKeyChecking=no' --rsync-path=\"sudo rsync\" --out-format='<<CHANGED>>%i %n%L' \"/tmp/go/\" \"xxx.xxx.xxx.xxx:/root/go\"",
Even though ctorgalson
is in the sudoers list, that user should become root in order to move files into root's home directory (i.e. it should run rsync
with sudo rsync ...
and not sudo -u ctorgalson rsync ...
); this suggests that the actual problem is something else...
The rsync
command shown in the error above attempts to copy files to xxx.xxx.xxx.xxx:/root/go
. This shows that the {{ GOROOT }}
fact is set to /root/go
even though the code appears to try to set it to the fubarhouse_user
's home directory.
I think (4) is the core issue, though I'm not sure what an appropriate solution might be--installing in individual users' home directories is not a viable solution for us :)
PS: according to Ansible's documentation, the Synchronize module "...is run and originates on the local host where Ansible is being run". Which sounds like the generated rsync
command above might always fail (since the get_url
task downloads to the remote host, but the Ansible-generated rsync
command's source is /tmp/go
and not e.g. xxx.xxx.xxx.xxx:/tmp/go
).
@ctorgalson I've actually been doing a bit of work on similar things, but I've rolled some more changes to the dev branch in and kicked off some tests, here's a summary.
cp...
task. The sync module is inconsistent, but it's all Ansible offers for that purpose...You can test this out on the dev-2.5.x branch, but I'll get a release out in the next day for you. I would be appreciative if you could tell me if the above changes solve your problem!
Link to tests:
2.5.0 is officially released, available via the galaxy.
As previously stated, I'd like to know if the changes have resolved your problems.
@fubarhouse I updated the role to 2.5.0. When attempting a run it fails, but in a different place:
TASK [fubarhouse.golang : Go-Lang | Run get commands] ******************************************************************************************************************************************************* failed: [jenkins.chromatic.is] (item={u'url': u'github.com/StackExchange/dnscontrol', u'name': u'dnscontrol'}) => {"changed": false, "cmd": "/root/go/bin/go get -u github.com/StackExchange/dnscontrol", "delta": "0:00:02.402064", "end": "2017-09-19 19:18:47.305134", "failed": true, "item": {"name": "dnscontrol", "url": "github.com/StackExchange/dnscontrol"}, "rc": 2, "start": "2017-09-19 19:18:44.903070", "stderr": "# runtime\n/root/go/src/runtime/mstkbar.go:151:10: debug.gcstackbarrieroff undefined (type struct { allocfreetrace int32; cgocheck int32; efence int32; gccheckmark int32; gcpacertrace int32; gcshrinkstackoff int32; gcrescanstacks int32; gcstoptheworld int32; gctrace int32; invalidptr int32; sbrk int32; scavenge int32; scheddetail int32; schedtrace int32 } has no field or method gcstackbarrieroff)\n/root/go/src/runtime/mstkbar.go:162:24: division by zero\n/root/go/src/runtime/mstkbar.go:162:43: invalid expression unsafe.Sizeof(composite literal)\n/root/go/src/runtime/mstkbar.go:162:44: undefined: stkbar\n/root/go/src/runtime/mstkbar.go:212:4: gp.stkbar undefined (type *g has no field or method stkbar)\n/root/go/src/runtime/mstkbar.go:213:15: gp.stkbar undefined (type *g has no field or method stkbar)\n/root/go/src/runtime/mstkbar.go:216:23: undefined: stackBarrierPC\n/root/go/src/runtime/mstkbar.go:226:28: gp.stkbarPos undefined (type *g has no field or method stkbarPos)\n/root/go/src/runtime/mstkbar.go:227:19: gp.stkbarPos undefined (type *g has no field or method stkbarPos)\n/root/go/src/runtime/mstkbar.go:248:41: undefined: stkbar\n/root/go/src/runtime/mstkbar.go:227:19: too many errors", "stderr_lines": ["# runtime", "/root/go/src/runtime/mstkbar.go:151:10: debug.gcstackbarrieroff undefined (type struct { allocfreetrace int32; cgocheck int32; efence int32; gccheckmark int32; gcpacertrace int32; gcshrinkstackoff int32; gcrescanstacks int32; gcstoptheworld int32; gctrace int32; invalidptr int32; sbrk int32; scavenge int32; scheddetail int32; schedtrace int32 } has no field or method gcstackbarrieroff)", "/root/go/src/runtime/mstkbar.go:162:24: division by zero", "/root/go/src/runtime/mstkbar.go:162:43: invalid expression unsafe.Sizeof(composite literal)", "/root/go/src/runtime/mstkbar.go:162:44: undefined: stkbar", "/root/go/src/runtime/mstkbar.go:212:4: gp.stkbar undefined (type *g has no field or method stkbar)", "/root/go/src/runtime/mstkbar.go:213:15: gp.stkbar undefined (type *g has no field or method stkbar)", "/root/go/src/runtime/mstkbar.go:216:23: undefined: stackBarrierPC", "/root/go/src/runtime/mstkbar.go:226:28: gp.stkbarPos undefined (type *g has no field or method stkbarPos)", "/root/go/src/runtime/mstkbar.go:227:19: gp.stkbarPos undefined (type *g has no field or method stkbarPos)", "/root/go/src/runtime/mstkbar.go:248:41: undefined: stkbar", "/root/go/src/runtime/mstkbar.go:227:19: too many errors"], "stdout": "", "stdout_lines": []}
The cause of this failure seems to be further upstream in the playbook as a bunch of tasks are being skipped and go
is not being installed successfully. Investigating further.
@markdorison,
I have just identified the problem, so I'll get a fix under way asap.
Edit: see See 0231ee845e02f153b3745b6f3716f9b50306606a
I'm just waiting for some tests (now running) to complete and I'll release it.
Edit:
2.6.1 is released, which includes the above commit.
Changelog will be added tonight, but it's available via the galaxy.
Edit (again):
If the distribution tasks are skipping, the removal of the old Go install will also fail.
It's my recommendation to delete your GOROOT in the event this fails again.
This role has worked for me in the past but I am now encountering the following error on the "Moving to installation directory" task. I redacted the information about the box it is running on.
fatal: [FQDN_REDACTED -> IP_REDACTED]: FAILED! => {"changed": false, "cmd": "/usr/bin/rsync --delay-updates -F --compress --delete-after --archive --rsh 'ssh -S none -o StrictHostKeyChecking=no' --rsync-path=\"sudo rsync\" --out-format='<<CHANGED>>%i %n%L' \"/tmp/go/\" \"IP_REDACTED:/root/go\"", "failed": true, "msg": "Warning: Permanently added 'IP_REDACTED' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey).\r\nrsync: connection unexpectedly closed (0 bytes received so far) [sender]\nrsync error: unexplained error (code 255) at io.c(226) [sender=3.1.0]\n", "rc": 255}