Exa-Networks / exabgp

The BGP swiss army knife of networking
Other
2.05k stars 440 forks source link

Peer reset log message missing after upgrade to 4.2.21 #1191

Closed stilwelld closed 7 months ago

stilwelld commented 7 months ago

Describe the bug

After upgrading from 3.4 to 4.2.21 we are no longer seeing the peer reset message in the logs. peer reset, message [closing connection] error[Broken TCP connection] We use this message to generate alarms when a peer has gone down or restarted.

To Reproduce

The root cause of the issue is that the logger in the protocol close() function changed from INFO level (network) to DEBUG level (packets) in the 4.2 release. We verified it is still at DEBUG level in the main branch.

[ src/exabgp/reactor/protocol.py ]
123            log.debug(reason, self.connection.session())

In the 3.4 release the message was logged at INFO level under the network log setting.

[ lib/exabgp/reactor/protocol.py ]
118            self.logger.network(self.me(reason))

The only way we can see the "peer reset" message in 4.2.21 is with log level DEBUG and packets = true. This setting results in too much logging, impacting the performance of the system.

Expected behavior

To be able to see the "peer reset" log message with INFO logging level.

Environment:

Additional context

thomas-mangin commented 7 months ago

Please let me know if it does the trick, if so I will backport.

stilwelld commented 7 months ago

Thomas, thank you for the quick reply, I don't think this will resolve the issue. When the reactor/peer _reset() is called it will call self._close(), which calls self.proto.close(message). In reactor/protocol.close() the message is logged as DEBUG, we need that changed to INFO to avoid logging all packet traffic. [ src/exabgp/reactor/protocol.py ] before: 123 log.debug(reason, self.connection.session()) after: 123 log.info(reason, self.connection.session())

thomas-mangin commented 7 months ago

The syslog messages are not a supported api. It should not be relied upon as I will be changed if/when required. If there is a way to get things working differently I am open to suggestions.

stilwelld commented 7 months ago

If syslog is not supported, how do we log messages when an event happens that requires an alarm ?

thomas-mangin commented 7 months ago

The api is the way to get information.

thomas-mangin commented 7 months ago

You could look like at examples as I authored a few presentations on the topic which are linked on the wiki

stilwelld commented 7 months ago

I read the following: https://github.com/Exa-Networks/exabgp/wiki/Controlling-ExaBGP-:-interacting-from-the-API But I don't see how that would be used in generating a log message, is there another page I should read ?

thomas-mangin commented 7 months ago

I am sorry that the documentation is so lacking ..

http://thomas.mangin.com/data/pdf/

stilwelld commented 7 months ago

I looked thru the presentations, very interesting work, but I am still not seeing the connection between the API and generating log messages.

Let me restate the issue, in version 3.4 a log message was written for "peer reset" at the INFO level when a peer was brought down or restarted. In our setup this would generate an alarm so the operations team would know one of the elements went offline. When we upgraded to 4.2.21 the log level was changed to DEBUG in the protocol.py close function. Since we don't run in DEBUG mode on our production servers we no longer receive the message in the log. The request was to change the log level in protocol.py close() back to INFO level so we can receive the message in the logs.

thomas-mangin commented 7 months ago

I can not see how I can be clearer.

You decided to parse information printed for humans to read for non-informational use but as a way to interface with the program. This SYSLOG message is not meant to be used by other software and how syslog messages are generated may change again.

You can either go back to using 3.4 or start using 4.2 or master and edit the lines of code you wish to see changed to have it printed when you desire.

I am sure it is an unsatisfactory answer, sorry.

stilwelld commented 7 months ago

Thomas, thank you for your time, we will look into an alternative solution. I'll close out this bug request.

stilwelld commented 7 months ago

Will look into using the "connection close" message instead of using "peer reset" to signal network management when a peer goes down.

thomas-mangin commented 7 months ago

If you find any problem while looking into it, feel free to re-open this issue. I wish you good luck!