Closed cpalmer9 closed 9 years ago
Hi Chris Can you show me how your topology file looks like? a snapshot of the gdb core stack trace and ptmd.log would be great.
Hi, can you instruct me on how to generate the gdb core stack trace?
Chris I assumed when you said the process is dying that a core dump is generated? if yes - you can open the core file in gdb and run "bt" - that would tell me which part of the code is crashing the process.
If there is no core file, then just give me the ptmd.log and i will try to figure it out
please send me the topology file and ptmd.log to begin with.
thanks
kanrajag, Can I email you these files (and I need your address)?
chris, please just attach them to this thread. you should be seeing the "selecting them" link in the bottom of the edit box. thanks
I just get: Unfortunately, we don't support that file type. Try again with a PNG, GIF, or JPG. Can I email them? I can provide a core file.
OK. I might not be able to receive them via email if the file attachments are too big. please use our ftp site to upload the tarball (logs + core) ftp.cumulusnetworks.com, authenticating with user "anonymous" and providing your email address as the password.
thanks
Thanks - uploaded MD5 (issue4.tgz) = 5c3f8894e3bfe4faf99137362de00ce5
Chris I looked at the topology file. I am not sure why you are putting in two entries per "myswitch":"port" ? we only act on one of the entries. it would help to explain your workflow here a bit
For eg. "xxx.tor001.xxx":"fpti1_0_52" -- "xxx.ln007.xxx":"swp1"; "xxx.tor001.xxx":"fpti1_0_52" -- "xxx.ln007.xxx":"fpti1_0_1";
We recently fixed a similar issue in our Cumulus repo when we detect "duplicate" edge descriptions. I will try and repro your situation in-house as well.
If you need help in making this work on Cumulus switches - please go through our regular support channels and you will get the latest code from our repo. if you need this on a non-Cumulus platform, what is the urgency for a fix? can it wait until we push the latest from our repo to github?
Btw, We are way behind on updating github. I plan on updating github when we are done with the current work items , hopefully by 1st week of april.
thanks for testing PTM and feedback is welcome!
-k
Hi kanrajag, I don't think the topology file is the issue. It's only when lldpd contains neighbors that I provided in the original comment.
The reason we included duplicate entries was that we don't always know the interface name of the adjacent host (can be different vendor, with different naming), so we include the variations. ptmd seems quite happy with this and ptmctl shows a 'pass' whenever it matches one of the endpoints (it doesn't show a 'fail' for the duplicates). In my opinion, this is great. If that behavior will change, then we'll need to figure out something else.
Thanks, Christopher
fixed with commit c9e9605e6d4f8d8eb3db7997152dcc313d4f4378
basically lldpd was returning NULL portdescr (which is why the output was blank in your original post). This caused ptm to crash while processing it. Added a few checks to handle this condition
Regarding the duplicate interfaces in topology.dot. I would like to understand how it is working for you. Based on the current design, PTM will only store one of the entries in its database and use that to compare with the LLDP neighbor. So depending on which entry is stored, the LLDP check could pass or fail when the nbr info is retrieved.
Thanks for fixing the issue I reported.
Regarding duplicate entries in topology.dot, this is how it's working for me currently:
[christopher@tor001 ~]$ ptmctl | grep 49
fpti1_0_49 pass N/A N/A
[christopher@tor001 ~]$ grep 49 /etc/ptm.d/topology.dot
"tor001.example.com":"fpti1_0_49" -- "ln001.example.com":"swp1";
"tor001.example.com":"fpti1_0_49" -- "ln001.example.com":"fpti1_0_1";
[christopher@tor001 ~]$ sudo lldpcli show neighbors | grep -A13 49
Interface: fpti1_0_49, via: LLDP, RID: 5, Time: 8 days, 20:19:02
Chassis:
ChassisID: mac 00:e0:ec:31:4d:ae
SysName: ln001.example.com
SysDescr: ICOS Linux
MgmtIP: 10.143.16.32
Capability: Bridge, off
Capability: Router, off
Capability: Wlan, off
Capability: Station, on
Port:
PortID: ifname fpti1_0_1
PortDescr: fpti1_0_1
-------------------------------------------------------------------------------
... as we don't always know what OS the remote end will be running (but will know the interface number). So we configure for both options. This is working great.
Would it be possible to allow a topology.dot config that supports 'duplicate' entries like this?
Can you show me your outputs for swp1 as well? (the same way you showed me for fpti1_0_1 ?) The above works because PTM is storing the nbr info "ln001.example.com:fpti1_0_1" (second line in topo file over-rides the first) - so I am curious to see how the "swp1" entry works for you..
also each time the nbr changes - do you do anything on the host side PTM ? (restart, reconfig etc)
Do you need this to be supported - just on github version or on Cumulus platform as well? if on Cumulus then please raise a FR (feature request) through the support channels and we will take a look at it (eventually make its way into github)
but to be clear - this is not a supported scenario and am not sure (yet) how it is working for you.
Hi, you made a good comment. I reversed the 2 topology.dot entries for fpti1_0_49 then restarted ptmd. ptmctl fails as ptm seems to be using the "last seen" entry (to your point).
1428690900.895435 2015-04-10 18:35:00 ptm_lldp.c:499 Port fpti1_0_49 NOT matched with remote - Expected [ln001.example.com.swp1] != [ln001.example.com.fpti1_0_1]
Out of curiosity, does topology.dot support pattern matching or wildcards?
No it does not support wildcarding/pattern matching
So how did it work for you in the first place? I am not able to figure that out
I think it worked because we were hitting the 'last duplicate' entry for an interface, which happened to match what was actually there.
so Chris were you (re)generating the topo file each time the nbr would change and it would just happen to be the last entry?
It would just happen to be the last entry. We were not regenerating the topo file. We had even put hostnames on both sides of the "--" in the topo file that didn't belong to the local host. PTM didn't seem to mind that, so I think I assumed duplicate interface lines were OK too.
Does ptm's topology dot try to match against PortID or PortDescr ?
Port:
PortID: ifname fpti1_0_1
PortDescr: port_1
1428708291.968670 2015-04-10 23:24:51 ptm_lldp.c:499 Port fpti1_0_52 NOT matched with remote - Expected [ln007.example.com.port_1] != [ln007.example.com.fpti1_0_1]
[christopher@tor001 ~]$ sudo lldpcli show neighbors port fpti1_0_52
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: fpti1_0_52, via: LLDP, RID: 9, Time: 0 day, 00:37:25
Chassis:
ChassisID: mac 00:e0:ec:27:bd:42
SysName: ln007.example.com
SysDescr: ICOS Linux
MgmtIP: 10.143.16.42
Capability: Bridge, off
Capability: Router, off
Capability: Wlan, off
Capability: Station, on
Port:
PortID: ifname fpti1_0_1
PortDescr: port_1
-------------------------------------------------------------------------------
Default is ifName But if you want to compare on PortDescr you need to add this to the edge description LLDP="match_type=portdescr"
Ah, that's really good to know. I'm looking at using that in additon to 'ip link set dev XXX_1 alias YYY_1' to match on the PortDescr that 'ip link' can control.
So you will be able to set the alias on your upstream switch port to the same PortDescr value (irrespective of upstream switch OS)
That's the intention. On upstream switches we don't control the description, we'll try to use an LLDP template to match on ifName.
Hi there, I am finding that my ptmd service is dying whenever we have a lldpd client output that includes something like this:
Interface: fpti1_0_3, via: LLDP, RID: 14, Time: 0 day, 02:23:49 Chassis:
ChassisID: mac 00:1e:67:c5:66:2e Port:
PortID: mac 00:1e:67:c5:66:2e
(No other fields are seen) My current workaround is to ignore the affected interfaces using LLDPD, however I'm more interested in seeing if PTM can withstand this. Thanks