Incorrect handling of last timeout (causes retransmit; should not)

calmh / node-snmp-native

Native Javascript SNMP library for Node.js

MIT License

252 stars 66 forks source link

Incorrect handling of last timeout (causes retransmit; should not) #24

Closed averas closed 8 years ago

averas commented 9 years ago

I've noticed that I get timeouts sometimes:

Response with unknown request ID from x.x.x.x. Consider increasing timeouts (30112 ms old?).

Looking at the traffic using tcpdump I can see that the original packet did timeout, and there is a re-transmit carried out after 30s. The answer, however, is discarded with the above error message.

Looking at the code it seems that the request id containing the timestamp of the original packet is not updated for the re-transmit, which means that the packet is in effect timed out before it is even re-transmitted.

calmh commented 9 years ago

Not changing the request ID is correct - it's the same requests we're sending again, and a delayed response to the first packet should satisfy the request in the retry as well. I'm not sure how you end up with a packet being both retransmitted and removed due to timeout at the same time, what are your timeout intervals set to?

averas commented 9 years ago

It seems the problem was that I had set the timeout to [ 30000 ], meaning no retries, however, what I see is:

0s: First get-packet is sent --No response received-- 30s: A second get packet is sent, although there should be no retries 30.1s: A response arrives this time, but the response is disregarded with an error

When I changed the timeout to an array things worked better, however I still wonder whether the last timeout will still send an extra packet.

calmh commented 9 years ago

It's buggy for the last timeout interval in the list. The packet is transmitted, then the next item in list of timeouts is consulted to schedule a retransmit in the future - if the list is now empty, the request is dropped. Instead, the last interval should be used for dropping the request and not for doing a retransmission.

So [30000] should mean no retries and timeout after 30 seconds, but it doesn't work that way currently. Well, you get the timeout after 30 seconds, but it sends a last unnecessary packet too.

MortenGuldager commented 8 years ago

UPDATE: 11.11.2015: With wireshark I have actually seen some devices sending the same response more than once. I don't think am affected by this bug at all.

I see this too, but only (it seems) when doing massively getAll requests over a larger (50~100) set of devices in parallel. My timeouts are [1000, 2000, 4000], but I just got errors like:

Response with unknown request ID from x.y.z.149. Consider increasing timeouts (12 ms old?).
Response with unknown request ID from x.y.z.149. Consider increasing timeouts (9 ms old?).
Response with unknown request ID from x.y.z.8. Consider increasing timeouts (5 ms old?).
Response with unknown request ID from x.y.z.25. Consider increasing timeouts (5 ms old?).
Response with unknown request ID from x.y.z.13. Consider increasing timeouts (172 ms old?).
Response with unknown request ID from x.y.z.6. Consider increasing timeouts (6 ms old?).

ralcini commented 8 years ago

Hi, any update about the possible fix? Best regards!

calmh commented 8 years ago

I don't actually use this currently, but bug fix PRs are always welcome. :)

ralcini commented 8 years ago

We can close issue? =) Fixed at 1.0.19