Xilinx-CNS / sfptpd

Solarflare Enhanced PTP Daemon. Use multiple PTP and PPS sources and sync local clocks together in one integrated application with high quality timestamp filtering, supporting bonds and VLANs, real time and long term stats reporting.
Other
25 stars 4 forks source link

Incorrect PTP clock (phc) selected with ConnectX-4 Lx #15

Closed sasharozenson closed 2 months ago

sasharozenson commented 2 months ago

Summary

When using sfptpd (version 3.7.1.1007) with Mellanox Technologies MT27710 Family [ConnectX-4 Lx], the daemon incorrectly picks the phc clock. This results in incorrect timestamps being applied.

Steps to Reproduce

  1. Manually set the PTP time on active interface, on which PTP Feed is being sent - /dev/ptp5 (name: enp5s0f1np1):

    sudo phc_ctl enp5s0f1np1 set 1000000000

    Expected time: Sun Sep 9 04:16:55 2001

  2. Start systemctl start sfptpd.

  3. The daemon incorrectly selects phc4 (associated with inactive interface enp5s0f0np0):

    Sep 11 13:25:04 somehost sfptpd[6380]: ptp: clock is phc4(enp5s0f0np0/enp5s0f1np1)
  4. When fetching the time for /dev/ptp4, it shows the incorrect date, fast-forwarding by 23 years:

    $ sudo phc_ctl /dev/ptp4 get
    phc_ctl[2210.504]: clock time is 2452138531.803377006 or Sun Sep 15 07:35:31 2047
  5. Meanwhile, /dev/ptp5 still retains the manually set timestamp:

    $ sudo phc_ctl /dev/ptp5 get
    phc_ctl[2231.775]: clock time is 1000001815.971293202 or Sun Sep  9 04:16:55 2001

Expected Behavior

The daemon should pick the correct PTP clock (phc5, not phc4), corresponding to the interface enp5s0f1np1, and properly sync the time without any large jumps.

Actual Behavior

The clock is unexpectedly advanced by 23 years due to the wrong PTP clock (phc4) being selected. This happens even with timestamping_interfaces and interface explicitly set to enp5s0f1np1.

Debug Information

Output showing two interfaces, each with its own PTP clock:

$ sudo ethtool -T enp5s0f1np1 | grep PTP
PTP Hardware Clock: 5

$ sudo ethtool -T enp5s0f0np0 | grep PTP
PTP Hardware Clock: 4

Hardware: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
SFPTPD: 3.7.1.1007
OS: Debian 11 "Bullseye"

sasharozenson commented 2 months ago

Update: After further investigation, I discovered the assume_one_phc_per_nic off option. This setting seems relevant for our case, as it prevents the assumption that only one PHC exists per NIC.

I will test with this option and report back with results. Thanks.

abower-amd commented 2 months ago

Hi @sasharozenson,

Thanks for raising and closing this issue!

The assume_one_phc_per_nic feature was added to help with some third party NICs that provided multiple phc instances but which represented the same underlying clock. That is a scenario that doesn't work very well with sfptpd because sfptpd by default auto-discovers all the NIC clocks to sink them together - a feature of the monolithic design of sfptpd in contrast with some other software.

However, this workaround turns out not to be an effective solution and also doesn't work well in the case you have found where the clocks really are independent as advertised.

The next release of sfptpd disables this feature by default (ChangeLog entry).

The current recommendation for users of third party NICs where multiple phc devices are present that are not independent is to list the clocks to be synced explicitly with the clock_list option. This is not relevant for you, however.

Andrew