fgci-org / fgci-ansible

:microscope: Collection of the Finnish Grid and Cloud Infrastructure Ansible playbooks
MIT License
54 stars 18 forks source link

ansible-pull-script fails after inclusion of ansible-role-lldpd #183

Closed jabl closed 7 years ago

jabl commented 7 years ago

TASK [ansible-role-lldpd : transmit tlvs with lldptool on RedHat if we just installed lldpd] * Friday 27 January 2017 12:15:01 +0200 (0:00:00.212) 0:04:45.382 ** fatal: [gpu10]: FAILED! => {"failed": true, "msg": "The conditional check 'ansible_{{ item[0] }}['active'] == true' failed. The error was: unexpected char u'b' at 20\n line 1\n\nThe error appears to have been in '/root/.ansible/pull/workdir/roles/ansible-role-lldpd/tasks/main.yml': line 21, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: transmit tlvs with lldptool on RedHat if we just installed lldpd\n ^ here\n"}

jabl commented 7 years ago

FWIW, names of interfaces on that node:


[root@gpu10 ~]# ip l
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp4s0f0:  mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 78:e7:d1:22:46:40 brd ff:ff:ff:ff:ff:ff
3: enp4s0f1:  mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 78:e7:d1:22:46:41 brd ff:ff:ff:ff:ff:ff
6: enp5s0d1:  mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 78:e7:d1:22:46:45 brd ff:ff:ff:ff:ff:ff
7: ib0:  mtu 2044 qdisc mq state UP mode DEFAULT qlen 1024
    link/infiniband a0:00:04:00:fe:80:00:00:00:00:00:00:78:e7:d1:03:00:22:46:45 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
8: ib0.80a1@ib0:  mtu 65520 qdisc mq state UP mode DEFAULT qlen 1024
    link/infiniband a0:00:04:20:fe:80:00:00:00:00:00:00:78:e7:d1:03:00:22:46:45 brd 00:ff:ff:ff:ff:12:40:1b:80:a1:00:00:00:00:00:00:ff:ff:ff:ff
9: ib0.80b1@ib0:  mtu 65520 qdisc mq state UP mode DEFAULT qlen 1024
    link/infiniband a0:00:05:00:fe:80:00:00:00:00:00:00:78:e7:d1:03:00:22:46:45 brd 00:ff:ff:ff:ff:12:40:1b:80:b1:00:00:00:00:00:00:ff:ff:ff:ff
martbhell commented 7 years ago

I had a look on gpu10 - this fails because there is a dot in some of your interfaces in {{ ansible_interfaces }}. Maybe some quotes in the correct place is probably the way to solve this, I haven't been able to figure it out now though. For FGCI it's enough to just replace away that dot, that will make ansible skip those non-existing "ib080a1" interfaces.

martbhell commented 7 years ago

Made a commit with changes in https://github.com/CSC-IT-Center-for-Science/ansible-lldpd/commit/de1cc67672660030988603fbbd328da946c56aa7 - too ugly?

jabl commented 7 years ago

If it works, fine for now. lldpd is not relevant for IB, anyway.

martbhell commented 7 years ago

Yeah. I'll keep this open - please let me know how it goes - the update is in fgci-ansible/devel. I hijacked gpu10, modified the role, ran the lldp tag and that worked at least.

jabl commented 7 years ago

Oh, I just realized on our frontend node we use VLAN's, so we have an active iface named "eno1.109@eno1"..

martbhell commented 7 years ago

:)

Thanks for beta testing. Maybe this?

replace('ib0.','') 

Role is not on login yet though

martbhell commented 7 years ago

Should check if this replaces all occurrences of the letter i and b..

martbhell commented 7 years ago

@jabl - I think it's enough to only transmit hostname from the "physical"/main interface (eno1 in your case) and not from any subinterfaces.

eno1 is in your {{ internal_interface }} for login group so it should be included automatically if the lldp role is used on the frontend. I can add it to login.yml in fgci-ansible/devel, but maybe you'd like to try first yourself?

I've tried running lldpad on a virtual machine and it seems to "not work" . At least "lldptool -i eth0 -t" doesn't show anything.

jabl commented 7 years ago

To be honest, I don't know why we need lldpd in the first place, as we're just running TCP/IP over normal ethernet, no DCB, RoCE or such.

martbhell commented 7 years ago

The reason I have for it is to be able to see from the switch CLI in which port a server is connected.

Could be useful during acceptance period, for making documentation or for remote data centers.

jabl commented 7 years ago

Ok, fair enough.

When we have needed stuff like this, we get the MAC address of the machine, then we have some expect script trawling through our switches and grep through all the ports to see if we find that MAC. Is this better in some way?

martbhell commented 7 years ago

With this there is no need to hunt down MAC addresses. A single/enabled command on the switch and it will tell which server is behind which port (below includes output where the lldptool enableTx sysName didn't work):

#show lldp remote-device all

LLDP Remote Device Summary

Local
Interface RemID   Chassis ID          Port ID           System Name
--------- ------- ------------------- ----------------- -----------------
Gi1/0/12  54      00:8C:FA:F0:7A:9A   00:8C:FA:F0:7A:9A   io1.int.fgci.c...
Gi1/0/13  60      00:8C:FA:F0:72:AE   00:8C:FA:F0:72:AE   io2.int.fgci.c...
Gi1/0/14  50      00:8C:FA:EB:DF:A0   00:8C:FA:EB:DF:A0
Gi1/0/15  51      00:8C:FA:F0:75:EA   00:8C:FA:F0:75:EA
martbhell commented 7 years ago

Checked one node in triton and it no longer fails on lldp. Closing this issue.