Open vitorroriz opened 7 years ago
Interesting. Do you have any ideas on how to fix it?
Maybe the returned link delay should always be "ok" and the decision about what action to take on the linkDelay should be at a higher level?
For now I am just ignoring the Null pointer checking and calling port->getLinkDelay(&delay);
anyway.
I am just not sure if it is ok. Is it intentional that returns of getLinkDelay smaller or equal than 0 are being interpreted as false returns? Or can we make the comparison with a literal, for eg, instead of if( !port->getLinkDelay(&delay) )
?
I'm not entirely clear on whether it was intentional or not, but by using an unsigned pointer to hold the link delay, any negative link delays will be translated into large positive link delays that will fall outside the INVALID_LINKDELAY check. But whether it's intentional or not, it's logically correct.
If you have calculated link delay values smaller than or equal to 0, you should not be getting any usable output on the slave device. It's physically impossible for such a value to be correct. The calculated delays in this case are wrong, probably because you don't have properly calibrated phy delay parameters set up for your hardware. PTP can't be expected to work without reasonably calibrated phy delay values.
Since you have your devices connected back-to-back, you should be able to measure the length of your cable and use some information such as from sites like these (https://en.wikipedia.org/wiki/Category_5_cable, https://www.eeweb.com/toolbox/twisted-pair) to calculate the expected propagation delay. By examining where the Tx and Rx delay values are used in the calculations, you should be able to tweak the values until you see measured link delay values that are reasonable for your cable length.
We should definitely have some better diagnostic reporting when invalid link delay values occur, but I think it's better that we don't pretend to calculate outputs when we don't have good inputs.
I have to slightly disagree with the previous comment by @pinealservo . While the average link delay should always be greater than 0, it is perfectly reasonable to expect negative link values, and require devices to correctly handle them. For example, if you have two devices that calculate the link delay with 20 ns precision, and they are connected using a 1 meter cable (with a delay of 5 ns), then you can legitimately calculate link delays as low as -15 ns when the devices are correctly calibrated. Of course, not all devices handle negative link delays correctly, and I have had to make my devices use inaccurate numbers to make sure the other side doesn't derive negative values in some situations. However, ideally, calculated link delay values off by -20 and +20 should both be considered equally precise, and should both be allowed so the average link delay is more accurate.
That's a good point. We actually have additional (incorrect) phy compensation in our devices to avoid -ve delays as well.
Yes it is a funny thing, if the PHY compensation is on the edge then you end up with a “minimum cable length” for your links!
On Jun 30, 2017, at 10:18, andrew-elder notifications@github.com wrote:
That's a good point. We actually have additional (incorrect) phy compensation in our devices to avoid -ve delays as well.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AVnu/OpenAvnu/issues/645#issuecomment-312324651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMgDTdCBWJjStWVdtdbeUDUKz4Yqgriks5sJS32gaJpZM4OJSo4.
Dear colleagues,
My setup is: 2 devices, running Linux imx7dsabresd 4.1.15, directly connected through the eth1 interfaces of each device. I am using the open-avb-next branch, and I added the ptp group for both devices, through groupadd ptp.
My initial aim is to get synchronization with the gPTP daemon so I can move on.
Running:
./daemon_cl eth1 -S -T -INITSYNC 0 -OPERSYNC 0 -INITPDELAY 0 -OPeERPDELAY 0 &> d.log
and./daemon_cl eth1 -S -L -INITSYNC 0 -OPERSYNC 0 -INITPDELAY 0 -OPeERPDELAY 0 &> d.log
By reading shm/ptp on the slave node I noticed that all its values were 0. I tracked the problem and it was happening because of the handler of followUp messages. When executing getLinkDelay, if it returns false the handler will execute
goto done
and therefore, setMasterOffset will not be called. The issue is that if the delay <= 0, delay is set to 0, and this is interpreted as a false return, as we were handling a null pointer.