facebook / fboss

Facebook Open Switching System Software for controlling network switches.
Other
860 stars 295 forks source link

Arp log bump #11

Closed capveg closed 6 years ago

capveg commented 8 years ago
Bumped up the priority of the "Not Mine" arphandler msg

This is a common misconfig and should be more obvious
than priority 5.
simpkins commented 8 years ago

Hey Rob, I think we might not actually want to take this change internally. If I'm understanding the logic correctly, can't this be triggered in legitimate cases that aren't a misconfig? Won't this be triggered for any broadcast ARP request seen for other hosts in the VLAN? (We generally run the fboss agent with --v=2 by default, which is why I'm reluctant to take this diff it can be triggered in normal situations. I'd be okay with bumping it to 4 or 3 though.)

If you have LLDP available in your setup, that might be a better way of tracking down cabling errors. The getLldpNeighbors() thrift call will report info on all neighbors we have learned about.

capveg commented 8 years ago

@simpkins I'm happy to bump this to a '3' -- I wasn't sure where the threshold was.

That said, I've lost a lot of my life wondering why the switch was dropping packets only to have it be related to a vlan misconfiguration, so I think this is still worth considering. I can't think of a situation where the switch is dropping packets here and you wouldn't want to know about it.

That said, I'm happy to follow you're lead here. Just let me know.

simpkins commented 8 years ago

Bumping it to a 3 sounds good to me.

I would expect this log message to be triggered relatively often in non-misconfigured scenarios. Consider the following scenario: the switch has IP 10.1.1.1/24 configured on a VLAN containing ports 1-10. Port 1 is connected to a server 10.1.1.2, and port 2 is connected to a server 10.1.1.3. If 10.1.1.2 wants to talk to 10.1.1.3, it will send out a broadcast ARP request for 10.1.1.3. I haven't tested this recently to confirm 100%, but I believe the ARP request packet will get copied to the switch CPU (in addition to being broadcast out ports 2-10), and it will get correctly ignored via this code path. 10.1.1.3 will respond to the request in this scenario.

capveg commented 8 years ago

Patch applied -- thanks for the feedback.

simpkins commented 8 years ago

Actually, I was just thinking about it a bit more, and we could log the message at a higher level if the IP is for a subnet that isn't configured on this interface at all. Receiving an ARP packet for an IP in a known subnet is probably normal, but an IP address in an unknown subnet is definitely more likely to be a misconfiguration.

Anyway, I'll go ahead and apply your current patch tomorrow. If you want to also update the code to check for unconfigured subnets and log a more verbose message in that case, feel free to send a patch for that :-) Otherwise I'll probably go ahead and create a low-priority task for one of our team members, but no promises on getting it done soon.

capveg commented 6 years ago

Ancient -- removing...