Uninett / nav

Network Administration Visualized
GNU General Public License v3.0
180 stars 37 forks source link

Work around broken dot1qVlanCurrentTable response from TENDA switches #2610

Open mtadeu opened 1 year ago

mtadeu commented 1 year ago

Thanks for these great software!

NAV 5.6.1 On SW tenda TEG5328P Type: [.1.3.6.1.4.1.8072.3.2.10 (tenda@switch from netsnmp)

 ipdevpolld -J inventory -n 192.168.0.2

[ERROR jobs.jobhandler] [inventory 192.168.0.2] Plugin nav.ipdevpoll.plugins.juniperdot1q.JuniperDot1q('192.168.0.2') reported an unhandled failure
--- <exception caught here> ---
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/plugins/juniperdot1q.py", line 64, in handle
    yield super(JuniperDot1q, self).handle()
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/plugins/dot1q.py", line 74, in handle
    yield self._get_tagging_info()
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/plugins/dot1q.py", line 109, in _get_tagging_info
    egress, untagged = yield self._retrieve_vlan_ports()
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/plugins/juniperdot1q.py", line 85, in _retrieve_vlan_ports
    (egress, untagged) = yield super(JuniperDot1q, self)._retrieve_vlan_ports()
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/plugins/dot1q.py", line 118, in _retrieve_vlan_ports
    egress = yield query.get_vlan_current_egress_ports()
  File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/opt/venvs/nav/lib/python3.7/site-packages/nav/mibs/qbridge_mib.py", line 129, in filter_newest_current_entries
    for (time_index, vlan_index), data in sorted(dot1qvlancurrenttable.items())
  File "/opt/venvs/nav/lib/python3.7/site-packages/nav/mibs/qbridge_mib.py", line 129, in <genexpr>
    for (time_index, vlan_index), data in sorted(dot1qvlancurrenttable.items())
builtins.ValueError: not enough values to unpack (expected 2, got 1)

2023-04-05 19:02:03,954 [ERROR jobs.jobhandler] [inventory 192.168.0.2] Job 'inventory' for 192.168.0.2 aborted: Job aborted due to plugin failure (cause=ValueError('not enough values to unpack (expected 2, got 1)'))

Additional context The last SNMP traffic:

20:00:35.977510 IP 192.168.0.5.53244 > 192.168.0.2.161:  C="public" GetBulk(32)  N=0 M=10 .1.3.6.1.2.1.17.7.1.4.3.1.1
20:00:36.039579 IP 192.168.0.2.161 > 192.168.0.5.53244:  C="public" GetResponse(247)  .1.3.6.1.2.1.17.7.1.4.3.1.1.1="default" .1.3.6.1.2.1.17.7.1.4.3.1.1.192="V192" .1.3.6.1.2.1.17.7.1.4.3.1.2.1=ff_ff_ff_f0 .1.3.6.1.2.1.17.7.1.4.3.1.2.192=00_00_00_00 .1.3.6.1.2.1.17.7.1.4.3.1.3.1=00_00_00_00 .1.3.6.1.2.1.17.7.1.4.3.1.3.192=00_00_00_00 .1.3.6.1.2.1.17.7.1.4.3.1.4.1=ff_ff_ff_f0 .1.3.6.1.2.1.17.7.1.4.3.1.4.192=00_00_00_00 .1.3.6.1.2.1.17.7.1.4.3.1.5.1=1 .1.3.6.1.2.1.17.7.1.4.3.1.5.192=1
20:00:36.043683 IP 192.168.0.5.53244 > 192.168.0.2.161:  C="public" GetBulk(32)  N=0 M=10 .1.3.6.1.2.1.17.7.1.4.2.1.4
20:00:36.107272 IP 192.168.0.2.161 > 192.168.0.5.53244:  C="public" GetResponse(241)  .1.3.6.1.2.1.17.7.1.4.2.1.4.1=ff_ff_ff_f0 .1.3.6.1.2.1.17.7.1.4.2.1.4.192=00_00_00_00 .1.3.6.1.2.1.17.7.1.4.2.1.5.1=ff_ff_ff_f0 .1.3.6.1.2.1.17.7.1.4.2.1.5.192=00_00_00_00 .1.3.6.1.2.1.17.7.1.4.2.1.6.1=2 .1.3.6.1.2.1.17.7.1.4.2.1.6.192=2 .1.3.6.1.2.1.17.7.1.4.2.1.7.1=0 .1.3.6.1.2.1.17.7.1.4.2.1.7.192=0 .1.3.6.1.2.1.17.7.1.4.3.1.1.1="default" .1.3.6.1.2.1.17.7.1.4.3.1.1.192="V192"

I try remove juniperdot1q with dot1q= Error too Then, remove dot1q #dot1q= Then, run without error

snmpwalk -v2c -c public 192.168.0.2 .1.3.6.1.2.1.17.7.1.4.3
iso.3.6.1.2.1.17.7.1.4.3.1.1.1 = STRING: "default"
iso.3.6.1.2.1.17.7.1.4.3.1.2.1 = Hex-STRING: FF FF FF F0 
iso.3.6.1.2.1.17.7.1.4.3.1.3.1 = Hex-STRING: 00 00 00 00 
iso.3.6.1.2.1.17.7.1.4.3.1.4.1 = Hex-STRING: FF FF FF F0 
iso.3.6.1.2.1.17.7.1.4.3.1.5.1 = INTEGER: 1

Workaround: How can I configure to do not use dot1q on switchs that type ?

TIA

lunkwill42 commented 1 year ago

Hi @mtadeu , there is currently no way to configure a plugin exemption for a specific device, and it might be better to just make a proper workaround for a buggy response (the code is fine with no response, but doesn't always handle buggy responses well).

Thanks for the traceback log and SNMP output! According to the traceback you are likely looking at the wrong SNMP traffic, though.

The buggy response seems to come when querying either the dot1qVlanCurrentEgressPorts or dot1qVlanCurrentUntaggedPorts objects (both in the dot1qVlanCurrentTable from RFC 4363).

The dot1qVlanCurrentTable should have a two-value index (dot1qVlanTimeMark, dot1qVlanIndex), but the error message from you log excerpt seems to indicate that the device responds with values that have only a single value index. What does it look like when you do an snmpwalk of dot1qVlanCurrentTable?

mtadeu commented 1 year ago

That's the problem: dot1qVlanCurrentTable: Unknown Object Identifier (Sub-id not found: (top) -> dot1qVlanCurrentTable)

There are Proprietary MIB. Maybe can help us. G5328P-MIB.zip

mtadeu commented 1 year ago

snmpwalk full: snmpwalk-tenda.zip

lunkwill42 commented 1 year ago

That's the problem: dot1qVlanCurrentTable: Unknown Object Identifier (Sub-id not found: (top) -> dot1qVlanCurrentTable)

This message isn't a statement about your switch. This error just means that the snmp command on your system could not locate a MIB file that defined an object named dot1qVlanCurrentTable, so the snmp command did not know what kind of query to send to the switch. You need to have the MIB definition files available on your system for the snmp command line programs to be able to look up things based on names.

snmpwalk full: snmpwalk-tenda.zip

Now that was more helpful, and served to confirm my suspicion that your switch has a buggy implementation of RFC 4363. These are the salient lines from your log:

iso.3.6.1.2.1.17.7.1.4.2.1.1.1 = Gauge32: 0
iso.3.6.1.2.1.17.7.1.4.2.1.2.1 = Gauge32: 1
iso.3.6.1.2.1.17.7.1.4.2.1.3.1 = Gauge32: 1
iso.3.6.1.2.1.17.7.1.4.2.1.4.1 = Hex-STRING: FF FF FF F0 
iso.3.6.1.2.1.17.7.1.4.2.1.5.1 = Hex-STRING: FF FF FF F0 
iso.3.6.1.2.1.17.7.1.4.2.1.6.1 = INTEGER: 2
iso.3.6.1.2.1.17.7.1.4.2.1.7.1 = Timeticks: (0) 0:00:00.00

These responses represent the 7 columns of the dot1qVlanCurrentTable (aka. dot1qVlanTimeMark, dot1qVlanIndex, dot1qVlanFdbId, dot1qVlanCurrentEgressPorts, dot1qVlanCurrentUntaggedPorts, dot1qVlanStatus, and dot1qVlanCreationTime).

However, the output clearly shows that one index element is missing from this response. The dot1qVlanTimeMark value should be the first element of the index, while dot1qVlanIndex should be the second. It seems the dot1qVlanTimeMark is missing from your switch's response. The correct output should look like this:

iso.3.6.1.2.1.17.7.1.4.2.1.1.0.1 = Gauge32: 0
iso.3.6.1.2.1.17.7.1.4.2.1.2.0.1 = Gauge32: 1
iso.3.6.1.2.1.17.7.1.4.2.1.3.0.1 = Gauge32: 1
iso.3.6.1.2.1.17.7.1.4.2.1.4.0.1 = Hex-STRING: FF FF FF F0 
iso.3.6.1.2.1.17.7.1.4.2.1.5.0.1 = Hex-STRING: FF FF FF F0 
iso.3.6.1.2.1.17.7.1.4.2.1.6.0.1 = INTEGER: 2
iso.3.6.1.2.1.17.7.1.4.2.1.7.0.1 = Timeticks: (0) 0:00:00.00

We could potentially make a workaround in NAV that ignores a broken index and assumes that the missing bit is the dot1qVlanTimeMark (which I don't think NAV uses for anything, anyway). This heuristic would break down horribly if a device comes along that instead dropped the dot1qVlanIndex value in favor of the former ;-)

In any case, you should really file a bug report with TENDA for this buggy behavior.

hmpf commented 1 year ago

@mtadeu if you can't spot the error, note that there are 14 "columns" in your log and there should be 15. Before that last column before the equals sign there is a missing column of 0's.

mtadeu commented 1 year ago

These switch have an workarounded access MIB !!!! It modify the output on dot1qVlanCurrentEntry See walk: tenda.dot1qVlanCurrentEntry.zip

The inventory run ok. But I got error on topo:

2023-05-08 17:50:56,697 [ERROR jobs.jobhandler] [topo 192.168.0.2] Caught exception during save. Last manager = CamManager(<class 'nav.ipdevpoll.shadows.cam.Cam'>, 'ContainerRepository'(...)). Last model = <class 'nav.ipdevpoll.shadows.cam.Cam'> Traceback (most recent call last): File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type macaddr: "8a:e1:dc:41:5c" LINE 1: ...05-08T17:50:56.696643'::timestamp, 'infinity', 0, '8a:e1:dc:...

2023-05-08 17:50:56,702 [ERROR jobs.jobhandler] [topo 192.168.0.2] Job 'topo' for 192.168.0.2 aborted: Job aborted due to save failure (cause=DataError('invalid input syntax for type macaddr: "8a:e1:dc:41:5c"\nLINE 1: ...05-08T17:50:56.696643\'::timestamp, \'infinity\', 0, \'8a:e1:dc:...\n ^\n'))

mtadeu commented 1 year ago

We could potentially make a workaround in NAV that ignores a broken index and assumes that the missing bit is the dot1qVlanTimeMark (which I don't think NAV uses for anything, anyway). This heuristic would break down horribly if a device comes along that instead dropped the dot1qVlanIndex value in favor of the former ;-)

Maybe do it only if type is .1.3.6.1.4.1.8072.3.2.10

In any case, you should really file a bug report with TENDA for this buggy behavior.

I open bug report on TENDA suport

mtadeu commented 1 year ago

HI,

Do you see that dot1qVlanCurrentTable can be accessed with other community? The tenda switch has an workaround !!!

But this cause other problem. See https://github.com/Uninett/nav/issues/2610#issuecomment-1539038475