Open tsawyer opened 5 months ago
Suggest also logging out-of-bounds packets (packet took longer than buflen
to get to host), similar to out-of-bounds packet logging in VOTER client console.
Question - is there any circumstance (packet OOB, packet OOO, corrupt, etc) that causes a chan_voter
instance to drop a client (remove client from VOTER display) other than a keepalive timeout, or does the loss of a client from the host perspective ALWAYS signify a keepalive timeout?
To my knowledge only a timeout (five missed pings) cause a RTCM (voter or missed client) to disconnect. Disconnects are indicated by a RTCM display dropout and a lost link CW message if a transmitter is connected.
Having said that, I've long had my suspicions that dropouts occur when there is no apparent network problem. So maybe you're on to something here.
I have noticed at times one of my TX/RX clients will flash on/off rapidly (a few times per second) in the VOTER display, Allmon, and Supermon. When this "flashing" occurs, there are gaps in the TX audio for that site.
The VOTER clients console shows no messages, and neither does the host Asterisk console whenever this "flashing" occurs.
The client in question is two 5GHz Ubiquiti hops away from the host, so it is difficult to determine the cause of these "drops". TX and RX buffers are set well above the actual latency of the network.
More logging in chan_voter
would greatly help troubleshooting this.
More logging in
chan_voter
would greatly help troubleshooting this.
Have you enabled debug logging for the module on your system? (e.g. core set debug 10 chan_voter
)
Have you enabled debug logging for the module on your system? (e.g.
core set debug 10 chan_voter
)
Yes, I have enabled debug logging, but it generates a significant amount of output (a few hundred lines per second), which is quite difficult to sift through:
[2024-08-03 19:07:42.802] DEBUG[687]: chan_voter.c:3850 voter_reader: Got rx packet, len 185 payload 1 challenge 574376273 digest 3ea42dd5
[2024-08-03 19:07:42.802] DEBUG[694]: chan_voter.c:2345 voter_xmit: Voter 1998: Sending KEEPALIVE (GPS) packet to client Client1 digest 2b8c5883
[2024-08-03 19:07:42.802] DEBUG[694]: chan_voter.c:2345 voter_xmit: Voter 1998: Sending KEEPALIVE (GPS) packet to client Client2 digest 59d79563
[2024-08-03 19:07:42.802] DEBUG[694]: chan_voter.c:2345 voter_xmit: Voter 1998: Sending KEEPALIVE (GPS) packet to client Client3 digest 764429f6
[2024-08-03 19:07:42.803] DEBUG[687]: chan_voter.c:3850 voter_reader: Got rx packet, len 185 payload 1 challenge 22652796 digest 008b685e
...
Out-of-bounds packets are logged as an error message on the client side in the VOTER/RTCM firmware but are not logged on the host side in a manner that's easily detectable.
To address this, I suggest logging these packets as actual warnings or errors, at the appropriate warning/error levels. This way, they will appear in the console/logs without requiring debug-level logging. This change would make it much easier to identify and address issues without having to manually parse through extensive debug logs.
Example: packet received by TallMountainSite 39ms too late for the RX buffer to accept
[2024-08-02 13:14:16.067] WARNING[687] chan_voter.c: Received [PAYLOAD 1] from TallMountainSite out of bounds: +39ms
To address this, I suggest logging these packets as actual warnings or errors, at the appropriate warning/error levels. This way, they will appear in the console/logs without requiring debug-level logging. This change would make it much easier to identify and address issues without having to manually parse through extensive debug logs.
No, please. The Asterisk console is too noisy already. Sending debug type messages as warnings or errors to the console is wholly inappropriate. Instead search the log or grep it in realtime... tail -f /var/log/asterisk/messages.log | grep "out of bounds"
.
No, please. The Asterisk console is too noisy already. Sending debug type messages as warnings or errors to the console is wholly inappropriate. Instead search the log or grep it in realtime...
tail -f /var/log/asterisk/messages.log | grep "out of bounds"
.
While I agree that sending messages to the default level console should be used sparingly, I would argue that an out-of-bounds RX packet is an ERROR in an otherwise working system and should most certainly be presented to the console as an error (or at the very least a warning). The debug level information is just status type messages, not errors or warnings. It seems rather illogical to bury an actual ERROR message beneath heaps of mostly useless debugging information.
I am aware that the messages could be grep'd from the console/log, but they shouldn't have to be. Only systems with an active VOTER implementation would see these messages. As an administrator of one of these systems, this is error level information that I would want logged at ALL times... not just when I'm manually logged into the CLI and have cranked up the levels.
No, please. The Asterisk console is too noisy already. Sending debug type messages as warnings or errors to the console is wholly inappropriate. Instead search the log or grep it in realtime...
tail -f /var/log/asterisk/messages.log | grep "out of bounds"
.While I agree that sending messages to the default level console should be used sparingly, I would argue that an out-of-bounds RX packet is an ERROR in an otherwise working system and should most certainly be presented to the console as an error (or at the very least a warning). The debug level information is just status type messages, not errors or warnings. It seems rather illogical to bury an actual ERROR message beneath heaps of mostly useless debugging information.
I am aware that the messages could be grep'd from the console/log, but they shouldn't have to be. Only systems with an active VOTER implementation would see these messages. As an administrator of one of these systems, this is error level information that I would want logged at ALL times... not just when I'm manually logged into the CLI and have cranked up the levels.
I'm not familiar with the context of what actually causes this warning (I'm an Asterisk developer, not an ASL user), so I don't want to commit to a particular approach, but I agree if it actually means something is wrong, it should be a warning, and vice versa.
Error in Asterisk is generally reserved for specific cases and I'm not sure would be appropriate here but a warning seems logical. I think the main question is whether it would be possible in a correctly configuration system (on both sides) to have this happen. If the user can't control it, generally warning/error is a bit extreme.
As an administrator of one of these systems, this is error level information that I would want logged at ALL times... not just when I'm manually logged into the CLI and have cranked up the levels.
FYI, for reference, you can also control which log levels are logged in logger.conf, including verbose messages or debug messages up to a certain level, for example.
If the user can't control it, generally warning/error is a bit extreme.
This appears to be a network error happening on one link (a Wifi link) to a RTCM. The other RTCM links are fine if I read the OP correctly. Therefore this is not an Asterisk issue. That's not to say there isn't something with chan_voter misbehaving with excessive OOB packets.
@InterLinked1 How do other channel drivers handle poor network performance in terms of warnings or errors? I don't recall seeing IAX or SIP report OOB issues. I believe the correct approach is something like iax2 show netstats
.
@Mason10198 What do you get for voter ping sitexx 100
?
A good link will repeatedly give:
PING (SantaYnez): Packets tx: 100, rx: 100, oos: 0, Avg.: 9.540 ms
PING (SantaYnez): Worst: 13 ms, Best: 8 ms, 100.0% Packets successfully received (0.0% loss)
If that's ok, I'm thinking this could be a GPS/timing issue.
If the user can't control it, generally warning/error is a bit extreme.
This appears to be a network error happening on one link (a Wifi link) to a RTCM. The other RTCM links are fine if I read the OP correctly. Therefore this is not an Asterisk issue. That's not to say there isn't something with chan_voter misbehaving with excessive OOB packets.
Okay, thanks for the background. I was thinking "out of bounds" meant something like the packet itself was malformed. If this can happen purely due to network conditions then that's a different matter.
Even if it's not something the user can influence, if it fundamentally causes major undesired behavior (something not working) a warning might still be appropriate. Can someone clarify what the actual impact of this issue happening is? Does the node become unusable or something like that? Is audio dropped completely?
@InterLinked1 How do other channel drivers handle poor network performance in terms of warnings or errors? I don't recall seeing IAX or SIP report OOB issues. I believe the correct approach is something like
iax2 show netstats
.
If it's during a call, I don't think there's much reporting for those sorts of things. The RTP stats might reflect some of those issues.
This appears to be a network error happening on one link (a Wifi link) to a RTCM. The other RTCM links are fine if I read the OP correctly. Therefore this is not an Asterisk issue. That's not to say there isn't something with chan_voter misbehaving with excessive OOB packets.
@tsawyer Correct. I never meant to imply that this was an issue with Asterisk or it's modules - I was only suggesting the presentation of information that is vital to system administrators.
What do you get for
voter ping sitexx 100
?
PING (VoterClient): Packets tx: 500, rx: 500, oos: 0, Avg.: 10.648 ms
PING (VoterClient): Worst: 27 ms, Best: 7 ms, 100.0% Packets successfully received (0.0% loss)
and I have buflen = 480
in my voter.conf
If that's ok, I'm thinking this could be a GPS/timing issue.
I am aware of the possible causes of the issue I am experiencing. I am not trying to report or solve any issues here in this repo, I'm just suggesting better logging to provide vital information to system administrators that might encounter this (or similar) issues in order to make tracking down and resolving network and timing issues significantly easier.
I think the main question is whether it would be possible in a correctly configuration system (on both sides) to have this happen. If the user can't control it, generally warning/error is a bit extreme.
@InterLinked1 This has nothing to do with errors in Asterisk itself, I apologize if I did not make that clear. You are correct that ERROR would be extreme. WARNING should be appropriate.
I was thinking "out of bounds" meant something like the packet itself was malformed. If this can happen purely due to network conditions then that's a different matter.
Correct, "out of bounds" just means the packet took longer than buflen
(voter.conf) to arrive at the host. The problem is that there is no sort of message whatsoever when one of the out of bounds packets is received by chan_voter
.
Even if it's not something the user can influence, if it fundamentally causes major undesired behavior (something not working) a warning might still be appropriate. Can someone clarify what the actual impact of this issue happening is? Does the node become unusable or something like that? Is audio dropped completely?
Lets ignore the TX chain of the VOTER system for a second and only focus on the RX chain.
buflen
in voter.conf
.(current_time - packet_timestamp) > buflen
, then the packet arrived too late and is "out of bounds"My suggestion is simply to have some kind of warning message when these "out of bounds" packets are detected, that does not require manually increasing the debug levels. My opinion is that this should not be "debugging" information and should be levelled higher than debugging info.
My suggestion is simply to have some kind of warning message when these "out of bounds" packets are detected, that does not require manually increasing the debug levels. My opinion is that this should not be "debugging" information and should be levelled higher than debugging info.
All of your points are well stated but I'm not convinced poor network performance should be logged to the console. Nor am I convinced setting a debug level is an undue burden. I opened this issue to clean up the very annoying and inappropriate voter timeout messages. Adding what would be another annoying message is going the wrong direction.
I believe a better approach would be to have an Asterisk CLI command similar to iax2 show netstats
, perhaps voter show netstats
. That command could provide ping times, out of order, out of band, and other useful historical and realtime stats without cluttering the Asterisk CLI.
Log Voter Client
-- Voter client OatMtn disconnect (timeout)
message with a NOTICE status.