Avnu / OpenAvnu

OpenAvnu - an Avnu sponsored repository for Time Sensitive Network (TSN and AVB) technology
464 stars 289 forks source link

aPTP errors. "Follow Up that didn't match.." etc. #825

Open m-nilsson opened 5 years ago

m-nilsson commented 5 years ago

Hi,

We are trying to use the gPTP daemon together with Apple AirPlay2 SDK on an ARM platform but ran into some problems. It doesn't seem to work well on our platform. We are using the ArtAndLogic-aPTP-changes branch.

We keep getting those error messages in the log: "Received Follow Up that didn't match the sync message. sync.seqId:683 followup.seqId:683".

Most of the time the seqIds are identical but sometimes they differ.

Another problem is that the function LinuxNetworkInterface::send() sometimes fails and we can see that the IpVersion is then 0 or 9 which seems wrong. I guess it is supposed to be either 4 or 6 (which it is when it succeeds). It could look like this: ERROR : GPTP [13:05:13:410] Failed to send to ip '::': Network is unreachable ERROR : GPTP [13:28:38:215] Failed to send to ip '::80.80.171.0': Network is unreachable

The AirPlay SDK seems to add nodes with both its ipv4 and ipv6 address.

The log file starts out like this: Failed to configure hardware timestamping: Operation not supported ERROR : GPTP [14:06:48:041] Attempting to configure software timestamping INFO : GPTP [14:06:48:041] findPTPIndex: info.phc_index: -1 ERROR : GPTP [14:06:48:041] Failed to find PTP device index ERROR : GPTP [14:06:48:042] Failed to initialize hardware timestamper, falling back to software timestamping

Then we see some errors in the log saying "Error getting link delay".

One node keeps printing those lines in the log repeatedly: STATUS : GPTP [14:48:58:222] Signalling Link Delay Interval: -128 STATUS : GPTP [14:48:58:223] Signalling Sync Interval: -3 STATUS : GPTP [14:48:58:223] Signalling Announce Interval: -128

I know this is many questions and a bit messy, but we just need some advice on how to move further and some understanding of what those messages mean and what could cause them. What is normal and what is not.

Any help is appreciated. Thanks in advance!

rroussel commented 5 years ago

Hi @m-nilsson ,

I will attempt to address some of your questions.

1) "Received Follow Up that didn't match the sync message. sync.seqId:683 followup.seqId:683". This message happens when message processing gets out of sync. We've seen this happen more on slower ARM based processors. The key is that many sync messages do make it through. Also key is that there is logic within the actual calculation of path delay etc. that limits the number of updates - to get cleaner values. So the fact that things get out of sync for a few hundred milliseconds shouldn't impact the over all time sync process. We also just pushed code this morning that should address an issue where the client clocks were not "locking" to the master clock. Please do a "pull" to get these changes.

2) "The function LinuxNetworkInterface::send() sometimes fails and we can see that the IpVersion is then 0 or 9 which seems wrong. I guess it is supposed to be either 4 or 6 (which it is when it succeeds)." This seems like a local networking issue. Are the correct IP addresses added to your group? Can you ssh to the ARM based devices? Does your local network support ipv6 addressing?

3) "The AirPlay SDK seems to add nodes with both its ipv4 and ipv6 address." Yes this is correct, the AirPlay SDK does add nodes with both ipv4 and ipv6 addresses. The aPTP version of the OpenAvnu code can process messages from ipv4 or ipv6 and at the same time i.e. it can send or receive ipv4 or ipv6 based messages within the same process.

4) "Then we see some errors in the log saying "Error getting link delay"." This is normal when the client and master clocks negotiate who becomes master. If you continue to see these issues in the log then there is a bigger issue a foot. Please do pull to get the latest as the latest changes may fix this.

m-nilsson commented 5 years ago

Thanks for your reply!

We are now using the latest commit and some of our problems are gone, however some problems still exist.

When streaming audio to only one single device, that device operates in clock master mode and if streaming audio to two devices one of them operates in master mode and the other one in slave mode. We see a difference between a master and a slave device.

A master device will have a log without errors, except some in the beginning which I guess is normal. Then the log looks like this:

STATUS   : GPTP [08:31:47:471] Signalling Link Delay Interval: -128
STATUS   : GPTP [08:31:47:471] Signalling Sync Interval: 1
STATUS   : GPTP [08:31:47:472] Signalling Announce Interval: -128
STATUS   : GPTP [08:31:47:476] Signalling Link Delay Interval: -128
STATUS   : GPTP [08:31:47:476] Signalling Sync Interval: 1
STATUS   : GPTP [08:31:47:477] Signalling Announce Interval: -128
STATUS   : GPTP [08:31:48:524] Signalling Link Delay Interval: -128
STATUS   : GPTP [08:31:48:525] Signalling Sync Interval: 1
STATUS   : GPTP [08:31:48:525] Signalling Announce Interval: -128

And after a while it stops writing to the log but continues to function.

However on the slave device there are a lot of those "follow up that didn't match" errors. The log looks like this:

ERROR    : GPTP [08:31:23:873] Received Follow Up that didn't match the sync message.
sync.seqId:629 followup.seqId:547
ERROR    : GPTP [08:31:23:873]        Sync clock id:88:33:14:FF:FE:0F:42:59
ERROR    : GPTP [08:31:23:873] source Port Identity:9D:FC:01:17:20:D0:00:00
INFO     : GPTP [08:31:23:921] Sync delta: 0.050611
INFO     : GPTP [08:31:23:923] Sync delta: 0.001527
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):10.3366
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):0.0967437
STATUS   : GPTP [08:31:23:988] Signalling Link Delay Interval: 0
STATUS   : GPTP [08:31:23:988] Signalling Sync Interval: 0
STATUS   : GPTP [08:31:23:989] Signalling Announce Interval: 3
INFO     : GPTP [08:31:23:999] Sync delta: 0.075538
ERROR    : GPTP [08:31:24:000] Received Follow Up that didn't match the sync message.
sync.seqId:630 followup.seqId:548
ERROR    : GPTP [08:31:24:000]        Sync clock id:88:33:14:FF:FE:0F:42:59
ERROR    : GPTP [08:31:24:001] source Port Identity:9D:FC:01:17:20:D0:00:00
INFO     : GPTP [08:31:24:049] Sync delta: 0.050905
INFO     : GPTP [08:31:24:051] Sync delta: 0.001215
INFO     : GPTP [08:31:24:125] Sync delta: 0.074620
ERROR    : GPTP [08:31:24:129] Received Follow Up that didn't match the sync message.
sync.seqId:631 followup.seqId:549
ERROR    : GPTP [08:31:24:129]        Sync clock id:88:33:14:FF:FE:0F:42:59
ERROR    : GPTP [08:31:24:129] source Port Identity:9D:FC:01:17:20:D0:00:00
INFO     : GPTP [08:31:24:173] Sync delta: 0.048026
INFO     : GPTP [08:31:24:175] Sync delta: 0.002228
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):6.25493
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):0.159874
STATUS   : GPTP [08:31:24:238] Signalling Link Delay Interval: 0
STATUS   : GPTP [08:31:24:239] Signalling Sync Interval: 0
STATUS   : GPTP [08:31:24:239] Signalling Announce Interval: 3
STATUS   : GPTP [08:31:24:240] Signalling Link Delay Interval: 0
STATUS   : GPTP [08:31:24:240] Signalling Sync Interval: 0
STATUS   : GPTP [08:31:24:240] Signalling Announce Interval: 3
INFO     : GPTP [08:31:24:258] Sync delta: 0.082793
ERROR    : GPTP [08:31:24:260] Received Follow Up that didn't match the sync message.
sync.seqId:632 followup.seqId:550
ERROR    : GPTP [08:31:24:260]        Sync clock id:88:33:14:FF:FE:0F:42:59
ERROR    : GPTP [08:31:24:260] source Port Identity:9D:FC:01:17:20:D0:00:00

We have ssh access to the ARM devices and we are only using 25% of the cpu which runs at 600MHz. Any thoughts?

rroussel commented 5 years ago

Hi @m-nilsson ,

A few thoughts...

1) If your slave nodes will always be slaves you can force them to be slaves by setting the priority1 value to 255 in the gptp_cfg.ini. This will prevent the BMCA from changing the master. The recent refactoring may have broken the BMCA, I'm looking into this.


# Priority1 value
# It can assume values between 0 and 255.
# The lower the number, the higher the priority for the BMCA.
priority1 = 255

2) There was a check in PTPMessageSync::processMessage that was lost in a prior commit that was put back this morning, so you will want to pull again. This will prevent processing of its own sync messages.

3) This is what your slave log should look like when it locks to the master. There will still be instances of the error "Received Follow Up that didn't match the sync message". So long as the slave is locked you should be good to go.

INFO     : GPTP [12:51:17:364] Sync delta: 0.127436
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):-293.952
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):-0.00340191
INFO     : GPTP [12:51:17:488] Sync delta: 0.124767
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):-276.022
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):-0.0036229
INFO     : GPTP [12:51:17:614] Sync delta: 0.125760
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):40.352
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):0.0247819
INFO     : GPTP [12:51:17:740] Sync delta: 0.126159
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):3.86596
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):0.258668
INFO     : GPTP [12:51:17:866] Sync delta: 0.125936
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):0.7402
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):1.35099
INFO     : GPTP [12:51:17:993] Sync delta: 0.126333
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):-25.1892
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):-0.0396995
INFO     : GPTP [12:51:18:118] Sync delta: 0.125703
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):-1.98534
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):-0.503692
INFO     : GPTP [12:51:18:245] Sync delta: 0.126224
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):0.509966
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):1.96091
STATUS   : GPTP [12:51:18:346] Signalling Link Delay Interval: 0
STATUS   : GPTP [12:51:18:346] Signalling Sync Interval: 0
STATUS   : GPTP [12:51:18:346] Signalling Announce Interval: 2
INFO     : GPTP [12:51:18:370] Sync delta: 0.125486
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):14.852
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):0.067331
INFO     : GPTP [12:51:18:496] Sync delta: 0.126198
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):0.497913
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):2.00838
INFO     : GPTP [12:51:18:622] Sync delta: 0.125726
andrew-elder commented 5 years ago

There was a check in PTPMessageSync::processMessage that was lost in a prior commit that was put back this morning, so you will want to pull again. This will prevent processing of its own sync messages.

I assume this is in a local repo somewhere?

rroussel commented 5 years ago

Hi @andrew-elder,

Yes this is actually in a forked repo: https://github.com/rroussel/OpenAvnu/tree/ArtAndLogic-aPTP-changes

Sadly github doesn't seem to allow issues to be tracked against the fork but pushes them back to the repo from which the fork originated.

m-nilsson commented 5 years ago

I am getting the error around eight times per second and as you can see in the log below it expects seqIds 3247, 3248, 3249, 3250 but receives 3271, 3272, 3273, 3274. It doesn't look healthy.

Time sync is not totally off, but it doesn't seem perfect either.

And do you have any idea why the master stops writing to the log after a while? The slave continues to write.

There are three devices involved in my setup. One iPhone and two ARM based speakers.

ERROR    : GPTP [06:49:25:374] Received Follow Up that didn't match the sync message.
sync.seqId:3271 followup.seqId:3247
ERROR    : GPTP [06:49:25:375]        Sync clock id:88:33:14:FF:FE:0F:42:59
ERROR    : GPTP [06:49:25:375] source Port Identity:9D:FC:01:17:20:D0:00:00
INFO     : GPTP [06:49:25:484] Sync delta: 0.110983
INFO     : GPTP [06:49:25:486] Sync delta: 0.002771
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):0.976327
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):1.02425
INFO     : GPTP [06:49:25:500] Sync delta: 0.013425
ERROR    : GPTP [06:49:25:506] Received Follow Up that didn't match the sync message.
sync.seqId:3272 followup.seqId:3248
ERROR    : GPTP [06:49:25:506]        Sync clock id:88:33:14:FF:FE:0F:42:59
ERROR    : GPTP [06:49:25:507] source Port Identity:9D:FC:01:17:20:D0:00:00
STATUS   : GPTP [06:49:25:520] Signalling Link Delay Interval: 0
STATUS   : GPTP [06:49:25:520] Signalling Sync Interval: 0
STATUS   : GPTP [06:49:25:521] Signalling Announce Interval: 3
INFO     : GPTP [06:49:25:611] Sync delta: 0.110546
INFO     : GPTP [06:49:25:613] Sync delta: 0.002996
RR1((((t4-t1)/2) - ((lt4-lt1)/2)) / (((t3-t2)/2) - ((lt3-lt2)/2))):1.59449
RR2((((t3-t2)/2) - ((lt3-lt2)/2)) / (((t4-t1)/2) - ((lt4-lt1)/2))):0.627162
INFO     : GPTP [06:49:25:628] Sync delta: 0.014608
ERROR    : GPTP [06:49:25:629] Received Follow Up that didn't match the sync message.
sync.seqId:3273 followup.seqId:3249
ERROR    : GPTP [06:49:25:630]        Sync clock id:88:33:14:FF:FE:0F:42:59
ERROR    : GPTP [06:49:25:631] source Port Identity:9D:FC:01:17:20:D0:00:00
INFO     : GPTP [06:49:25:742] Sync delta: 0.114029
INFO     : GPTP [06:49:25:745] Sync delta: 0.002953
INFO     : GPTP [06:49:25:754] Sync delta: 0.008580
ERROR    : GPTP [06:49:25:756] Received Follow Up that didn't match the sync message.
sync.seqId:3274 followup.seqId:3250
m-nilsson commented 5 years ago

@rroussel , Do you have any idea what the problem on the slave could be? I'm getting "Follow that didn't match" about 10 times per second and get the "RR1/RR2" about 3 times per second. I have tried different Wifi networks and there is no difference.

rroussel commented 5 years ago

@m-nilsson ,

1) I've pushed a new version of the code that addresses the out of order sync messages. Please pull and give it a try. 2) I've also adjusted the logging so that the out of order sync messages are now a warning as all sync messages are now processed with the retry logic. 3) The RR1, RR2 messages are not errors but rather a way to see how the rate ratios are being calculated over time. 4) If you want to see all the detail with what is happening with each message and the calculations you can change the logging levels in gptp/common/gptp_log.hpp, uncomment one or both of these lines:

#define GPTP_LOG_INFO_ON            1
#define GPTP_LOG_STATUS_ON          1
//#define GPTP_LOG_DEBUG_ON         1
//#define GPTP_LOG_VERBOSE_ON       1
rroussel commented 5 years ago

@m-nilsson,

I've pushed another update. This version has a more robust mechanism for processing all the ptp messages. It also fixes an issue with failing to lock to the grand master clock.

iamnagaky commented 5 years ago

Hi @rroussel , I am porting the Art & Logic aPTP stack to ThreadX platform on an ARM processor for AirPlay 2. The issues faced by me are:

  1. I have set my DUT to slave mode during initialisation of the stack & after receiving the ANNOUNCE message from the iPhone, the isBetterThan() api returns that the PortIdentity of my DUT is better than that of the iPhone's PortIdentity. So i have reversed the memcmp logic in isBetterThan with a greater than operator. Thus now my iPhone is set as GrandMaster instead of my DUT itself. So what's the logic behind the memcmp of the 14bytes in isBetterThan()

  2. After the workaround done above, i was able to receive SYNC, FOLLOWUP, RESPONSE from the iPhone, but during offset calculation in MaybePerformCalculations() the sequence id's of all the four messages must be matched. In my case the sequence id of followup messages are far beyond the SYNC, REQUEST messages as mentioned by @m-nilsson . The FOLLOWUP sequence id doesn't cope up with others even for a single time.

  3. For the case 2, i have modified the thread priority of generalPort thread and the FOLLOWUP sequence id catch up with other messages sometimes. But the RESPONSE messages never catch up with other three messages. When i checked the Wireshark capture of the same, the Response is sent from the iPhone with a bit delay sometimes. So even if the SYNC, FOLLOWUP, REQUEST sequence ids are same, the RESPONSE sequence id is always falling behind. Kindly advice how this could be handled?