EttusResearch / uhd

The USRP™ Hardware Driver Repository
http://uhd.ettus.com
Other
942 stars 645 forks source link

X3x0 - repeatedly setting sync source to external delays FPGA time #699

Open mmatthebi opened 10 months ago

mmatthebi commented 10 months ago

Issue Description

I'm using an NI-2974 USRP (which is actually an X310 device), which is attached to an octoclock with 10MHz Ref und PPS Input. I can succesfully synchronize on the external signal and the relative FPGA time (obtained from mb_controller->get_timekeeper()) matches the relative real time.

However, If I repeatedly set the sync source to external, the FPGA time runs slower than the normal time. I suspect some counter is not updated during the set_sync_source call. The call should either be cached or the counter should keep running. The problem does not appear with internal sync source. The problem does not appear with USRP X410.

Setup Details

Connect PPS+Ref from Octoclock to USRP-2974. The USRP runs UHD4.4 on Ubuntu 22.04. UHD was installed from the ettus PPA.

Compile and run below source code using

g++ synctest.cpp -l uhd && ./synctest
#include <iostream>
#include <uhd/rfnoc/mb_controller.hpp>
#include <uhd/rfnoc_graph.hpp>
#include <chrono>
#include <thread>

using uhd::rfnoc::rfnoc_graph;

using namespace std::chrono_literals;

double timeDiff(std::chrono::time_point<std::chrono::system_clock> start) {
        std::chrono::duration<double> diff = std::chrono::system_clock::now() - start;
        return diff.count();
}

double fpgaDiff(uhd::rfnoc::mb_controller::timekeeper::sptr keeper, double fpgaStart) {
        double now = keeper->get_time_now().get_real_secs();
        return now - fpgaStart;
}

int main() {
        auto graph = rfnoc_graph::make("addr=192.168.40.2");
        std::string source = "external";
        graph->get_mb_controller()->set_sync_source(source,source);

        auto keeper = graph->get_mb_controller()->get_timekeeper(0);
        keeper->set_time_next_pps(uhd::time_spec_t(0.0));
        std::this_thread::sleep_for(1s);

        double fpgaStart = keeper->get_time_now().get_real_secs();
        auto start = std::chrono::system_clock::now();

        std::cout << "FPGA time after PPS reset: " << fpgaStart << std::endl;

        for (int i = 0; i < 3; i++) {
                std::this_thread::sleep_for(1s);
                std::cout << "after waiting 1s: sys: " << timeDiff(start) << " fpga " << fpgaDiff(keeper, fpgaStart) << std::endl;
        }

        for(int i = 0; i < 100; i++) {
                graph->get_mb_controller()->set_sync_source(source,source);
        }

        std::cout << "after setting sync source: sys: " << timeDiff(start) << " fpga " << fpgaDiff(keeper, fpgaStart) << std::endl;
        for (int i = 0; i < 3; i++) {
                std::this_thread::sleep_for(1s);
                std::cout << "after waiting 1s: sys: " << timeDiff(start) << " fpga " << fpgaDiff(keeper, fpgaStart) << std::endl;
        }

        return 0;
}

Expected Behavior

I expect that the relative system time and relative FPGA time are equal, regardless how often I call set_sync_source.

Actual Behaviour

Program output is given below. As seen, after calling set_sync_source 100 times in a row, the FPGA time is 0.3seconds behind the system time. Afterwards, the FPGA time runs fine again (however, with a constant 0.3s delay).

mmatthe@NE-LAB-Krypton-10:~$ g++ synctest.cpp -o synctest -luhd && ./synctest 
[INFO] [UHD] linux; GNU C++ version 11.3.0; Boost_107400; UHD_4.4.0.0-0ubuntu1~jammy1
[INFO] [X300] X300 initialization sequence...
[INFO] [X300] Maximum frame size: 1472 bytes.
[INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929b
[INFO] [X300] Radio 1x clock: 200 MHz
FPGA time after PPS reset: 0.244699
after waiting 1s: sys: 1.00018 fpga 1.00127
after waiting 1s: sys: 2.00144 fpga 2.00253
after waiting 1s: sys: 3.00268 fpga 3.00376
after setting sync source: sys: 11.565 fpga 11.1919
after waiting 1s: sys: 12.5661 fpga 12.1931
after waiting 1s: sys: 13.5673 fpga 13.1943
after waiting 1s: sys: 14.5686 fpga 14.1956

Additional Information

I know that nobody would do stuff like that above. However, in cases where a long-running program is running on the USRP, it might happen that developers have the set_sync_source in some loop and hence have it called multiple times. Then, when using multiple USRPs, they would get out of sync.

mbr0wn commented 3 months ago

I'm not sure we want to "fix" this, because that would mean changing behaviour for existing users. Some background:

When you set the clock source to anything other than internal, then we always reconfigure the LMK (PLL). According to comments in the code, this is so we can force an update, as external clocks can go away. This means the actual clocks stop running for until the PLL has reset. In other words, setting the clock source to external on any X3x0 device will break previously set times.

In fact, this is true for any USRP: If you do change clock sources, previously set times will be off. It's just that X310 forces the clock reset, even if you ask it for the same clock source as it was.

Now, some devices work this way, and others don't. (This inconsistency isn't great, but it's also because of the vastly different underlying hardware). For example, N310 and X410 cache the existing value, but X410 will still allow you to force a reinit if that's what you want. But these devices also take much, much longer to fix a clock.

We have some options:

I'm curious for feedback.

mmatthebi commented 3 months ago

Thanks for the detailed feedback on this behaviour! All options are viable, I would prefer option 1, i.e. only reconfigure if needed, but make sure the clocks are locked. From a user perspective this is in my eyes the most reasonable behaviour. In addition, one could add a force-init flag to this method.

However, I don't know if existing users rely on the present behaviour, which would then break their programs. I would be suprised though, because the change in the timing is non-deterministic and hence users won't rely on it hopefully.