charlestolley / python-snmp

A user-friendly SNMP library
MIT License
16 stars 3 forks source link

Problem when reading some non-existent OIDs #14

Open denisbondar opened 6 months ago

denisbondar commented 6 months ago

Hi, @charlestolley

I don't know how to formulate the problem more correctly, because I haven't found an explanation for it. Maybe you know what it is about or can help me to understand it.

We have several OLT ZTE C610 instances, which respond to non-existent OIDs with unintelligible data packet. On other instances everything works as it should, but there are some instances where this happens.

In this screenshot, I executed getBulk() once with timeout=4. But in Wireshark I see this response. I can't provide the PCAP itself, as it contains information I can't share publicly.

image

I end up getting a snmp.manager.Timeout exception, although I expect to get something like noSuchName or NoSuchObject or NoSuchInstance. Or just an empty VarBindList().

But, what will Net-SNMP do? I performed a similar SNMP query using the net-snmp command:

$ snmpbulkwalk -v2c -r1 -t4 10.21.112.82 1.3.6.1.4.1.3902.1015.1010.1.7.4.1.7

And I get the next one, quite an interesting result:

image

Upon receiving this unintelligible response from the SNMP agent, net-snmp performs a snmp-get request to the same OID. And receives a clear noSuchObject in response.

image

Is there any way to replicate this behavior inside getNext(), getBulk() methods?

charlestolley commented 6 months ago

Assuming you are not setting msgMaxSize to something other than the default, this actually seems like a bug in the implementation of the agent. If an agent generates a response that is larger than the limit specified in the request, it's supposed to return a tooBig error. In this case, since it's not doing it, the UDP packet, which may be up to 65535 bytes is getting truncated by the ethernet layer, which supports a maximum payload of 1500 bytes. Clearly netsnmp is more fault-tolerant in this way, and it's able to detect that this is a valid packet that's simply been truncated by the network, and so it downgrades use a getNext request, which you would expect to have a smaller payload.

I'm not sure precisely what it would take to enable this type of fault-tolerance. It would probably involve creating a subclass of ParseError to replace the "Incomplete value" error in the decode() function in ber.py. The SNMPv3Message and PDU classes would need to be updated to continue parsing after detecting this error, in order to read the message/request ID, if possible, so that the proper RequestHandle can be notified of the failure, and propagate the error to the caller of getBulk().

In the meantime, you will probably need to manually limit the number of values you request at once, so that the agent will not generate responses that are too large.

denisbondar commented 6 months ago

In general, ZTE vendor's equipment is quite unstable. It is hard to argue with this.

To check, I reduced max-repetitions to 2 (it was 10) and got a correct response (from wireshark's point of view).

image

But anyway I see that net-snmp makes an additional snmp-get request to explicitly get noSuchObject. And this idea seems quite good to me.

Maybe that practice should be adopted for python-snmp as well? Including within the getNext and getBulk methods, so that instead of repeating snmp-get-next or snmp-get-bulk requests within a timeout, perform a snmp-get request to determine if the requested OID exists?