Avnu / OpenAvnu

OpenAvnu - an Avnu sponsored repository for Time Sensitive Network (TSN and AVB) technology
467 stars 288 forks source link

followUp message handling & getLinkDelay #645

Open vitorroriz opened 7 years ago

vitorroriz commented 7 years ago

Dear colleagues,

My setup is: 2 devices, running Linux imx7dsabresd 4.1.15, directly connected through the eth1 interfaces of each device. I am using the open-avb-next branch, and I added the ptp group for both devices, through groupadd ptp.

My initial aim is to get synchronization with the gPTP daemon so I can move on.


Running: ./daemon_cl eth1 -S -T -INITSYNC 0 -OPERSYNC 0 -INITPDELAY 0 -OPeERPDELAY 0 &> d.log and ./daemon_cl eth1 -S -L -INITSYNC 0 -OPERSYNC 0 -INITPDELAY 0 -OPeERPDELAY 0 &> d.log

By reading shm/ptp on the slave node I noticed that all its values were 0. I tracked the problem and it was happening because of the handler of followUp messages. When executing getLinkDelay, if it returns false the handler will execute goto done and therefore, setMasterOffset will not be called. The issue is that if the delay <= 0, delay is set to 0, and this is interpreted as a false return, as we were handling a null pointer.

    /**
     * @brief  Gets the link delay information.
     * @param  [in] delay Pointer to the delay information
     * @return True if valid, false if invalid
     */
    bool getLinkDelay( uint64_t *delay )
    {
        if(delay == NULL) {
            fprintf(stderr, "VITOR DEBUG: problem on the delay var, &delay == null\n");
            return false;

        }
        *delay = getLinkDelay();
        return *delay < INVALID_LINKDELAY;
    }

    /**
     * @brief  Sets link delay information.
     * Signed value allows this to be negative result because
     * of inaccurate timestamps.
     * @param  delay Link delay
     * @return True if one_way_delay is lower or equal than neighbor
     * propagation delay threshold False otherwise
     */
    bool setLinkDelay( int64_t delay )
    {
        one_way_delay = delay;
        int64_t abs_delay = (one_way_delay < 0 ?
                     -one_way_delay : one_way_delay);

        if (testMode) {
            GPTP_LOG_STATUS("Link delay: %d", delay);
        }

        return (abs_delay <= neighbor_prop_delay_thresh);
    }
andrew-elder commented 7 years ago

Interesting. Do you have any ideas on how to fix it?

andrew-elder commented 7 years ago

Maybe the returned link delay should always be "ok" and the decision about what action to take on the linkDelay should be at a higher level?

vitorroriz commented 7 years ago

For now I am just ignoring the Null pointer checking and calling port->getLinkDelay(&delay); anyway. I am just not sure if it is ok. Is it intentional that returns of getLinkDelay smaller or equal than 0 are being interpreted as false returns? Or can we make the comparison with a literal, for eg, instead of if( !port->getLinkDelay(&delay) ) ?

pinealservo commented 7 years ago

I'm not entirely clear on whether it was intentional or not, but by using an unsigned pointer to hold the link delay, any negative link delays will be translated into large positive link delays that will fall outside the INVALID_LINKDELAY check. But whether it's intentional or not, it's logically correct.

If you have calculated link delay values smaller than or equal to 0, you should not be getting any usable output on the slave device. It's physically impossible for such a value to be correct. The calculated delays in this case are wrong, probably because you don't have properly calibrated phy delay parameters set up for your hardware. PTP can't be expected to work without reasonably calibrated phy delay values.

Since you have your devices connected back-to-back, you should be able to measure the length of your cable and use some information such as from sites like these (https://en.wikipedia.org/wiki/Category_5_cable, https://www.eeweb.com/toolbox/twisted-pair) to calculate the expected propagation delay. By examining where the Tx and Rx delay values are used in the calculations, you should be able to tweak the values until you see measured link delay values that are reasonable for your cable length.

We should definitely have some better diagnostic reporting when invalid link delay values occur, but I think it's better that we don't pretend to calculate outputs when we don't have good inputs.

bdthomsen commented 7 years ago

I have to slightly disagree with the previous comment by @pinealservo . While the average link delay should always be greater than 0, it is perfectly reasonable to expect negative link values, and require devices to correctly handle them. For example, if you have two devices that calculate the link delay with 20 ns precision, and they are connected using a 1 meter cable (with a delay of 5 ns), then you can legitimately calculate link delays as low as -15 ns when the devices are correctly calibrated. Of course, not all devices handle negative link delays correctly, and I have had to make my devices use inaccurate numbers to make sure the other side doesn't derive negative values in some situations. However, ideally, calculated link delay values off by -20 and +20 should both be considered equally precise, and should both be allowed so the average link delay is more accurate.

andrew-elder commented 7 years ago

That's a good point. We actually have additional (incorrect) phy compensation in our devices to avoid -ve delays as well.

jdkoftinoff commented 7 years ago

Yes it is a funny thing, if the PHY compensation is on the edge then you end up with a “minimum cable length” for your links!

On Jun 30, 2017, at 10:18, andrew-elder notifications@github.com wrote:

That's a good point. We actually have additional (incorrect) phy compensation in our devices to avoid -ve delays as well.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AVnu/OpenAvnu/issues/645#issuecomment-312324651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMgDTdCBWJjStWVdtdbeUDUKz4Yqgriks5sJS32gaJpZM4OJSo4.