MangoAutomation / BACnet4J

BACnet/IP stack written in Java. Forked from http://sourceforge.net/projects/bacnet4j/
GNU General Public License v3.0
187 stars 111 forks source link

Criticial segmentation bug - can cause network-wide communication problems #8

Closed Frozenlock closed 6 years ago

Frozenlock commented 8 years ago

We noticed a BACnet compatibility issue with some devices.

This particular bug will in turn bring up a bug in BACnet4J which causes it to endlessly spam the target device for the missing segment. The spam is often enough to crash the target device and noticeably slow down the entire network.

mlohbihler commented 7 years ago

Well, that should teach the target device not to mess around with BACnet4J. :) I'll see what i can do to recreate.

mlohbihler commented 7 years ago

Actually, can you describe the problem more fully? The target device does not respect the requested window size when receiving? It asks for 4 segments and then waits for an ack after two. What does this cause BACnet4J to do? And what should it do?

Frozenlock commented 7 years ago

This is indeed a problem caused when another device is not behaving as it should.

Because the target devices is wrongly waiting for an ACK before sending the next segment, BACnet4J sees this as a communication error and tries again. It does so endlessly and apparently as many times and as fast as it can (which causes communication problems for the entire network).

It's been a few month since I've studied this problem, but I think it was related to this logic:

https://github.com/infiniteautomation/BACnet4J/blob/master/src/main/java/com/serotonin/bacnet4j/transport/DefaultTransport.java#L825

I've "solved" the issue on a fork by simply going straight to a BACnet error. Hopefully you'll come up with a better solution.

mlohbihler commented 7 years ago

I moved the 'umIter.remove();' statement up so that it applies to the whole timeout section. This should have largely the same effect as your change, but preserves the sending of the NAK (although i can't find the call for this in the specification at the moment).

Frozenlock commented 6 years ago

Couldn't retest on the problematic network for months; I'll just close this and reopen if it turns out the fix wasn't sufficient.