Open kamakazikamikaze opened 9 years ago
Looks like a bug. It's happening at the point where it's looking at the management IPs for CDP neighbors on akh-114-a. It could be bad data from snmp. Does the cdp neighbor info on that switch have any weird things going on?
edit: The depth is really only significant for limiting the used stack space. Since the thing is recursive we don't want to go 1000 functions deep. I'm pretty confident that python could probably handle a fairly large depth value, certainly much more than 10. So I don't think you're going to have any trouble from that aspect.
Funnily enough I'm currently SSH'd into that. The switch is a C3750 stack with two members. The G1/0/1 ports on both are connected to the AKH switch stack (one as a redundant in case of failure). Perhaps we're stuck in a loop? I would doubt it because many, if not all, C3750 and C3850 stacks have at least one redundant feed and a depth of 1 ran successfully.
Here's the CDP output.
akh-114-a#sho cdp nei
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
akh-1a-a.dcs.byu.edu
Gig 2/0/1 127 S I WS-C3750- Gig 1/0/2
akh-1a-a.dcs.byu.edu
Gig 1/0/1 165 S I WS-C3750- Gig 2/0/2
SEP000DBCCC6AF7 Fas 1/0/31 143 H P M IP Phone Port 1
SEP000DBCCC6750 Fas 1/0/32 126 H P M IP Phone Port 1
SEP000DBCD9409D Fas 1/0/28 157 H P M IP Phone Port 1
SEP00082166EDC6 Fas 1/0/6 162 H P M IP Phone Port 1
SEP0006D74B186C Fas 1/0/25 138 H P M IP Phone Port 1
SEP081FF3636C98 Fas 1/0/20 179 H P M IP Phone Port 1
SEP0007EB2F8489 Fas 1/0/35 150 H P M IP Phone Port 1
SEP000821D1B912 Fas 1/0/11 172 H P M IP Phone Port 1
SEP64AE0C5FDBCE Fas 1/0/7 167 H P M IP Phone Port 1
SEPD0574C6AC535 Fas 1/0/38 125 H P M IP Phone Port 1
SEP1C1D86C56A95 Fas 1/0/39 161 H P M IP Phone Port 1
SEP000DBCCC6C72 Fas 1/0/29 171 H P M IP Phone Port 1
SEPA0CF5B801459 Fas 1/0/48 147 H P M IP Phone Port 1
SEPA418758ADC75 Fas 1/0/41 124 H P M IP Phone Port 1
SEP54781A1CFC91 Fas 1/0/47 123 H P M IP Phone Port 1
SEP000DBCD94099 Fas 1/0/19 150 H P M IP Phone Port 1
SEP00097CEC99B4 Fas 1/0/21 124 H P M IP Phone Port 1
SEP3037A616B384 Fas 1/0/46 123 H P M IP Phone Port 1
AKH-114-100-AN1 Fas 1/0/1 9 S Xirrus XR Gig1
Nothing here seems out of the ordinary (from my novice observation). I know that redundant feeds aren't an issue because the depth 1 run had this in its 'finished' output:
[None] dc-3n604e-a-corea:gi7/21 -(UNKNOWN)-> dc-3n620e-b-cr11:gi7/10
UNKNOWN -> None
[None] dc-3n604e-a-corea:gi7/22 -(UNKNOWN)-> dc-3n620e-b-cr11:gi7/7
UNKNOWN -> None
[None] dc-3n604e-a-corea:gi7/23 -(UNKNOWN)-> dc-3n620e-b-cr11:gi7/8
UNKNOWN -> None
[None] dc-3n604e-a-corea:gi7/24 -(UNKNOWN)-> dc-3n620e-b-cr11:gi7/9
Router AKH-1A-A doesn't have anything out of the ordinary either:
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
dc-3n819e-a-ca.dcs.byu.edu
Gig 2/0/1 134 R S I WS-C6509- Gig 2/11
dc-3n819e-a-ca.dcs.byu.edu
Gig 1/0/1 174 R S I WS-C6509- Gig 1/11
akh-300a-a.dcs.byu.edu
Gig 2/0/3 174 S I WS-C3750- Gig 1/0/1
akh-300a-a.dcs.byu.edu
Gig 1/0/3 176 S I WS-C3750- Gig 2/0/1
SEP081FF363692E Fas 1/0/27 175 H P M IP Phone Port 1
SEPC0626B63F6D4 Fas 1/0/28 135 H P M IP Phone Port 1
SEP64AE0CF7D2E3 Fas 1/0/2 139 H P M IP Phone Port 1
SEP000750036DE6 Fas 1/0/26 146 H P M IP Phone Port 1
AKH-114-134-AN1 Fas 1/0/12 5 S Xirrus XR Gig1
akh-114-a.dcs.byu.edu
Gig 2/0/2 173 S I WS-C3750- Gig 1/0/1
akh-114-a.dcs.byu.edu
Gig 1/0/2 159 S I WS-C3750- Gig 2/0/1
EDIT: I found that another member, AKH-114-100-AN1, was at the bottom of AKH-114-A's list which would have been printed out next. You don't have to proceed unless my findings are wrong. I'll leave this here for further debugging
So AKH-1A-A seems to be done as it has visited. Since it's a child node of 'dc-3n819e-a-ca' I decided to check that out:
dc-3n819e-a-ca#sho cdp nei
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
lves-w160e-a.dcs.byu.edu
Gig 2/12 158 S I WS-C3750- Gig 2/0/1
lves-w160e-a.dcs.byu.edu
Gig 1/12 172 S I WS-C3750- Gig 1/0/1
akh-1a-a.dcs.byu.edu
Gig 2/11 125 S I WS-C3750- Gig 2/0/1
akh-1a-a.dcs.byu.edu
Gig 1/11 147 S I WS-C3750- Gig 1/0/1
TRPB-110-A1.dcs.byu.edu
Gig 2/18 152 S I WS-C3560- Gig 0/1
dc-3n620e-b-cr11.dcs.byu.edu
Gig 2/1 135 R S I WS-C6509- Gig 8/37
dc-3n620e-b-cr11.dcs.byu.edu
Gig 2/17 167 R S I WS-C6509- Gig 8/12
dc-3n620e-b-cr11.dcs.byu.edu
Gig 1/17 175 R S I WS-C6509- Gig 7/12
dc-3n620e-b-cr11.dcs.byu.edu
Gig 1/1 145 R S I WS-C6509- Gig 7/37
mb-150-a.dcs.byu.edu
Gig 1/5 170 S I WS-C3750- Gig 1/0/1
mlrp-144-a2.dcs.byu.edu
Gig 2/9 131 S I WS-C3560- Gig 0/1
cone-225-a1.dcs.byu.edu
Gig 2/6 179 S I WS-C3560- Gig 0/1
mlrp-144-a1.dcs.byu.edu
Gig 1/9 140 S I WS-C3560- Gig 0/1
cone-225-a1.dcs.byu.edu
Gig 1/6 179 S I WS-C3560- Gig 0/2
alln-339-a1.dcs.byu.edu
Gig 2/8 140 S I WS-C3560- Gig 0/1
CANC-161-a1.dcs.byu.edu
Gig 2/10 130 S I WS-C3560- Gig 0/4
CANC-161-a1.dcs.byu.edu
Gig 1/10 128 S I WS-C3560- Gig 0/1
BRMB-151-A.dcs.byu.edu
Ten 8/7 178 S I WS-C4506- Ten 1/2
BRMB-151-A.dcs.byu.edu
Ten 7/7 131 S I WS-C4506- Ten 1/1
BRMB-265-A.dcs.byu.edu
Ten 8/6 154 S I WS-C4506- Ten 1/1
BRMB-265-A.dcs.byu.edu
Ten 7/6 129 S I WS-C4506- Ten 1/2
mc-1212-a.dcs.byu.edu
Gig 2/14 176 S I WS-C3750V Gig 2/0/1
mc-1212-a.dcs.byu.edu
Gig 1/14 128 S I WS-C3750V Gig 1/0/1
ppt3-loc3-a.dcs.byu.edu
Gig 2/2 129 S I WS-C3750G Gig 2/0/10
ppt3-loc3-a.dcs.byu.edu
Gig 2/16 136 S I WS-C3750G Gig 2/0/12
ppt3-loc3-a.dcs.byu.edu
Gig 1/16 177 S I WS-C3750G Gig 1/0/12
ppt3-loc3-a.dcs.byu.edu
Gig 1/2 177 S I WS-C3750G Gig 1/0/10
mp-190-a1.dcs.byu.edu
Gig 1/19 124 S I WS-C3560- Gig 0/1
dc-4n442e-a1-RADIO.dcs.byu.edu
Gig 1/18 139 S I WS-C3750X Gig 1/0/1
conf-loc2-b.dcs.byu.edu
Gig 2/15 129 S I WS-C4506- Gig 1/4
conf-loc2-b.dcs.byu.edu
Gig 2/3 126 S I WS-C4506- Gig 1/6
conf-loc2-b.dcs.byu.edu
Gig 1/15 139 S I WS-C4506- Gig 1/3
conf-loc2-b.dcs.byu.edu
Gig 1/3 138 S I WS-C4506- Gig 1/5
dc-3n504e-a-coreb.dcs.byu.edu
Ten 8/5 154 R S I WS-C6509- Ten 8/14
dc-3n604e-a-corea.dcs.byu.edu
Ten 7/5 154 R S I WS-C6509- Ten 8/14
ldsp-106a-a.dcs.byu.edu
Gig 2/13 160 S I WS-C3750- Gig 2/0/1
ldsp-106a-a.dcs.byu.edu
Gig 1/13 160 S I WS-C3750- Gig 1/0/1
So let's sort it out:
I don't know if your code has already entered LVES at this point because it's not printed out ('....LVES-W160E-A ( << IP >>)'). Assuming that it has, here is the CDP report:
lves-w160e-a#sho cdp nei
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
dc-3n819e-a-ca.dcs.byu.edu
Gig 2/0/1 129 R S I WS-C6509- Gig 2/12
dc-3n819e-a-ca.dcs.byu.edu
Gig 1/0/1 141 R S I WS-C6509- Gig 1/12
LVES-E110F-A1.dcs.byu.edu
Gig 1/0/2 143 S I WS-C3560- Gig 0/1
lves-l334a-a.dcs.byu.edu
Gig 3/0/1 157 S I WS-C3750- Gig 2/0/1
lves-l334a-a.dcs.byu.edu
Gig 2/0/2 152 S I WS-C3750- Gig 1/0/1
sttw-100-a1.dcs.byu.edu
Fas 2/0/48 160 S I WS-C3560- Gig 0/1
SEP000DBCCC7127 Fas 1/0/1 127 H P M IP Phone Port 1
SEP000DBCD9409E Fas 1/0/2 130 H P M IP Phone Port 1
SEPB8621F6D8D5C Fas 1/0/17 132 H P M IP Phone Port 1
lves-w126b-a1.dcs.byu.edu
Gig 3/0/4 172 S I WS-C3560- Gig 0/2
lves-w126b-a1.dcs.byu.edu
Gig 3/0/3 172 S I WS-C3560- Gig 0/1
SEP081FF3638032 Fas 1/0/13 133 H P M IP Phone Port 1
lves-n101-a1.dcs.byu.edu
Gig 1/0/4 136 S I WS-C3560- Gig 0/1
lves-n155-a1.dcs.byu.edu
Fas 2/0/45 152 S I WS-C3560- Gig 0/1
SEP000DBCE97083 Fas 1/0/3 135 H P M IP Phone Port 1
lves-e194-a1.dcs.byu.edu
Gig 1/0/3 172 S I WS-C3560- Gig 0/1
SEP000DBCCC6C88 Fas 1/0/18 143 H P M IP Phone Port 1
lves-n195-a1.dcs.byu.edu
Fas 2/0/44 123 S I WS-C2940- Gig 0/1
lves-s156-a1.dcs.byu.edu
Gig 2/0/4 150 S I WS-C3560- Gig 0/1
lves-w104-a1.dcs.byu.edu
Gig 2/0/3 129 S I WS-C3560- Gig 0/1
At this point I'm not sure. If you need LVES-E110F's CDP report I can add that.
Whoops! Just noticed something. I know I went overboard here so let's step back near the top. Check AKH-114-A's CDP report. At the bottom of it's list it shows:
AKH-114-100-AN1 Fas 1/0/1 9 S Xirrus XR Gig1
This isn't in the output. So somehow it's stuck here. If you print output as soon as a device is found then it hasn't tried to enter it yet. If you aren't printing until after you check an SNMP request and look at the CDP table then maybe it's stuck here. I'll edit this when I check what it holds.
EDIT
AKH-114-100-AN1 10.23.69.20 Xirrus XR520, 512MB (300MHz) Gig1 none L2SW(switch)
Let me see if other Xirrus devices include themselves (or nothing) in their tables. Perhaps the device doesn't have CDP enabled. It wouldn't show in AKH-114-A's table if so but our Xirrus devices don't play fair with Cisco's equipment
EDIT2
CDP is enabled. So I'm a little stumped. I'm running a depth 7 instance to see if it skips over this problem. I'll try a depth of 6 if I do
Would you be comfortable changing some of the code? I'd like to see what OID it polls and what the return value is at this point.
In graph.py Around line 151 try adding these print()s between these two lines
rip = snmpobj.cache_lookup(cdp_vbtbl, OID_CDP_IPADDR + '.' + ifidx + '.' + t[15])
print('cdp "%s" = "%s"' % (name, val))
print('cache_lookup(cdp, "%s") = "%s"' % ((OID_CDP_IPADDR + '.' + ifidx + '.' + t[15]), rip))
rip = convert_ip_int_str(rip)
Before doing this I just noticed this isn't a Cisco device. Can you do a show cdp n fa1/0/1 de
on akh-114-a and see if there's a management IP address advertised?
I'll enter that once our other instances have finished. I understand that the scripts are compiled into a .pyc at runtime but I'm unaware of any potential problems if it tries to compile with the updated code while another .pyc is open and active in the directory.
Also, should/could we make it print out to a log instead of to the screen? I'm trying to run these in the background
You might be able to redirect the output from stdout and stderr to a file and fork it to the background?
# ( mnet.py -r 10.10.10.10 &> ./10.10.10.10.log ) &
not tested
I'm using nohup
and I figured it out. Very similar to your line. I'll add it in and run it at depth 10 again. I won't be able to report back until tomorrow when I'm back at work
EDIT
It may have actually been a result of our server dropping the connection to the vlan we need to access these devices. I reset my settings and am trying again. Time to get out of this cubicle!
Here's a log of depth 10 from the same starting node.
Same starting point but depth of 8.
Here's the log from an attempt of depth 2 in a different building.
Thus far I have had only one process successfully complete. All others have had this same issue.
Alright, looks like it's returning an empty string when you query the IP address OID.
Try changing line 188 of util.py from
if (iip != None):
to
if ((iip != None) | (iip != '')):
I added that half an hour ago and have it running. Question though now that I've re-read it, should it be
if ((iip != None) & (iip != '')):
since we don't want to proceed if iip = ''
?
Yes! Sorry about that. Doh
On Jul 17, 2015, at 12:57 PM, Kent Coble notifications@github.com wrote:
I added that half an hour ago and have it running. Question though now that I've re-read it, should it be if ((iip != None) & (iip != '')): since we don't want to proceed if iip = ''?
— Reply to this email directly or view it on GitHub.
I made the edit and ran a shallow instance to test. We've successfully detected when iip = ''
but now we're stopped when an UNKNOWN
is returned:
cdp "1.3.6.1.4.1.9.9.23.1.2.1.1.6.112.20" = "dc-3n504e-a-coreb.dcs.byu.edu"
cache_lookup(cdp, "1.3.6.1.4.1.9.9.23.1.2.1.1.4.112.20") = "0x0a030159"
cdp "1.3.6.1.4.1.9.9.23.1.2.1.1.6.113.7" = "node0"
cache_lookup(cdp, "1.3.6.1.4.1.9.9.23.1.2.1.1.4.113.7") = ""
Traceback (most recent call last):
File "mnet.py", line 183, in <module>
main(sys.argv[1:])
File "mnet.py", line 61, in main
graph(argv[1:])
File "mnet.py", line 119, in graph
graph.crawl_node(opt_root_ip, opt_depth)
File "/srv/samba/Share/Students/Kent's crap/workspace/mnet/mnetsuite/graph.py", line 198, in crawl_node
self.crawl_node(child, depth-1)
File "/srv/samba/Share/Students/Kent's crap/workspace/mnet/mnetsuite/graph.py", line 157, in crawl_node
if (self.is_node_allowed(rip) == 0):
File "/srv/samba/Share/Students/Kent's crap/workspace/mnet/mnetsuite/graph.py", line 207, in is_node_allowed
ipaddr = IPAddress(ip)
File "/usr/local/lib/python2.7/dist-packages/netaddr/ip/__init__.py", line 306, in __init__
'address from %r' % addr)
netaddr.core.AddrFormatError: failed to detect a valid IP address from 'UNKNOWN'
Hmm.. I think we should allow these nodes by default. You can add this if-statement to graph.py to get around the error you've run across.
def is_node_allowed(self, ip):
if (ip == 'UNKNOWN'):
return 1
ipaddr = None
if (USE_NETADDR):
But because of that we would also need to stop an unknown from being crawled, they can only be leafs on the graph. So you'll need to add this too
def crawl_node(self, ip, depth):
if ((self.is_node_allowed(ip) == 0) | (ip == 'UNKNOWN')):
return
Thanks for all the help btw!
Ok, I've applied the changes and I'll start with a shallow depth run for the sake of time. If that passes I'll run it at a depth of 10 and report when it's all finished.
Glad to see I'm actually contributing. Was this script originally intended for smaller/more layered networks? It looks like output works well if there are fewer devices in breadth and more in depth. Perhaps these issues are raising only because of how poorly implemented our network is :stuck_out_tongue_winking_eye:
EDIT
Depth of 2 passed. Will add new comment when 10 passes. SVG also looks ok but I haven't, and likely will not, checked if it's completely accurate.
It worked! Took a long time but it finished and exported to an SVG. Check it out!
Holy cow that's one big diagram! Is that a network at a single location or is it spanning multiple sites somehow? Either way, good luck trying to draw something like that and keep it up to date in visio, that would be a full time job. Lol.. glad it worked for ya! I see a couple oddities in it though, some work that could be done here and there.
For instance, some of the nodes aren't showing the correct platform. That's something I found rather frustrating as Cisco seems to put them in different places depending on the platform type, which is the exact information I'm trying to find out. sfh-103-b, sfh-strack-a, conf-loc2-b, and rb-239-a are a few. Also noticed asb-b223-b looped back on itself. Could be a transparent bridge is there?
This is our network starting at the entry router. It is indeed connecting to multiple buildings if that's what you mean by sites. Like I said, there are over 3300 VLANs and tons of routers/firewalls/filters for it all.
Let me help you out:
channel-protocol lacp
in it. Maybe that's what you're looking forIf you need help determining the SNMP values for the specific platforms I may be able to provide them. I actually was looking it up a few weeks ago to check for the PSUs on C4506s. The C3850s will definitely be different than the others since it acts as a stack, thereby reiterating indices (ex. 1000 is switch 1, 2000 is switch 2, ...)
EDIT
Looking at v0.2 (I haven't updated yet) I see that you're checking for:
OID_PLATFORM1 = ENTITY-MIB::entPhysicalDescr
OID_PLATFORM2 = CISCO-ENTITY-ASSET-MIB::ceAssetTag
OID_PLATFORM3 = CISCO-ENHANCED-IMAGE-MIB::ceImageFamily
OID_PLATFORM4 = ENTITY-MIB::entPhysicalDescr
I have an old script that retrieves a platform using OID entPhysicalModelName
which works just fine for the devices above. Perhaps you should try that instead?
EDIT 2
If you have bash
and the Net-SNMP package correctly installed you could try something along the lines of this:
# Get device model; OIDs 1,1000,or 1001 will have a switch (stack) member
read -a model <<<$(snmpbulkget -v2c -r2 -t1 -c $COMMUNITY $SWITCH ENTITY-MIB::entPhysicalModelName -Oqv)
printf "$model\n"
\ EDIT 3**
Using the snmp command above I get these:
~ $ snmpbulkget -v2c -r2 -t1 -c <Community> sfh-103-b ENTITY-MIB::entPhysicalModelName -Oqv
WS-C6509-E
~ $ snmpbulkget -v2c -r2 -t1 -c <Community> sfh-strack-a ENTITY-MIB::entPhysicalModelName -Oqv
WS-C3850-48P
WS-C3850-48P
~ $ snmpbulkget -v2c -r2 -t1 -c <Community> conf-loc2-b ENTITY-MIB::entPhysicalModelName -Oqv
WS-C4506-E
~ $
SNMPBULKGET
will get all those empty values afterwards so you'll definitely need to trim them.
A 4-member C3850 stack would return this:
WS-C3850-48P
WS-C3850-48P
WS-C3850-48P
WS-C3850-48P
(Just so you know that the SFH-STRACK-A isn't reporting a duplicate)
I started using the same MIB for stack info, works great. I also added a bit at the very end to check all the nodes to verify we pulled the serial, IOS, and platform. if not it uses your MIB to pull them direct. So we prefer CDP since it's faster and we assume reliable, and if we don't get it there then we poll it direct with SNMP. I just committed the change and it seems to work well.
I ran the latest
mnet.py graph
solution and encountered an error. I'm not sure if it's because there are too many device entries or if my depth setting is too steep. I did a dry run by setting a depth of 0 and 1 to ensure that all dependencies were properly installed which finished successfully.This instance was given a depth of 10. Not too sure if that's too deep but it did correctly detect devices down to a depth of 7.
I've uploaded the log to my Google Drive since I cannot directly upload .txt files. Though I've censored it it still shows all the steps and depth it was able to reach.