eclipse-cyclonedds / cyclonedds

Eclipse Cyclone DDS project
https://projects.eclipse.org/projects/iot.cyclonedds
Other
858 stars 354 forks source link

The DDS API execution takes a long time and cannot meet real-time requirements. #2107

Open yanzhang920817 opened 1 day ago

yanzhang920817 commented 1 day ago

According to the introduction of DDS, its real-time performance seems to be better, but after my test, the execution time of DDS's own API is not stable, as shown in the following example:


int DDSUtil::CreatePublisher(PublisherInfo &pub)
{
    // create dds publihser
    dds_return_t rc;
    dds_qos_t *qos;
    /* Create a Participant. */
    pub.participant = dds_create_participant(DDS_DOMAIN_DEFAULT, NULL, NULL);
    if (pub.participant < 0)
    {
        DDS_FATAL("dds_create_participant: %s\n", dds_strretcode(-pub.participant));
        goto err_free_pub;
    }
    printf("%s: create participant successfully\n", __FUNCTION__);

    /* Create a Topic. */
    // pub.topic = dds_create_topic(pub.participant, &pub.topicInfo.topicDesc, pub.topicInfo.topicName, NULL, NULL);
    if (!strcmp(pub.topicName, "rt/lowstate"))
    {
        pub.topic = dds_create_topic(pub.participant, &LowState__desc, pub.topicName, NULL, NULL);
        qos = dds_create_qos();
        dds_qset_history(qos, DDS_HISTORY_KEEP_LAST, 1);
    }
    else if (!strcmp(pub.topicName, "humaniod/state"))
    {
        pub.topic = dds_create_topic(pub.participant, &HumaniodState__desc, pub.topicName, NULL, NULL);
        qos = dds_create_qos();
        dds_qset_history(qos, DDS_HISTORY_KEEP_LAST, 1);
    }
    else if (!strcmp(pub.topicName, "rt/inspire/state"))
    {
        pub.topic = dds_create_topic(pub.participant, &InspireState__desc, pub.topicName, NULL, NULL);
        qos = dds_create_qos();
        dds_qset_history(qos, DDS_HISTORY_KEEP_LAST, 1);
    }
    if (pub.topic < 0)
    {
        DDS_FATAL("dds_create_topic: %s\n", dds_strretcode(-pub.topic));
        goto err_delete_participant;
    }
    printf("%s: create topic successfully\n", __FUNCTION__);

    /* Create a Writer. */
    pub.writer = dds_create_writer(pub.participant, pub.topic, qos, NULL);
    if (pub.writer < 0)
    {
        DDS_FATAL("dds_create_writer: %s\n", dds_strretcode(-pub.writer));
        goto err_delete_topic;
    }
    printf("%s: create writer successfully\n", __FUNCTION__);
    fflush(stdout);
    dds_delete_qos(qos);
    return 0;

err_delete_writer:
    dds_delete(pub.writer);
err_delete_topic:
    dds_delete(pub.topic);
err_delete_participant:
    dds_delete(pub.participant);
    dds_delete_qos(qos);
err_free_pub:
    return -1;
}
int DDSUtil::PublishLowState(void)
{
    struct timespec begin, end1, end2, end3, end4, end5;
    long timer1 = 0, timer2 = 0, timer3 = 0, timer4 = 0, timer5 = 0;
    static long maxTimer1 = 0, maxTimer2 = 0, maxTimer3 = 0, maxTimer4 = 0, maxTimer5 = 0;
    dds_return_t rc = 0;
    // get data
    clock_gettime(CLOCK_MONOTONIC, &begin);
    // 1 game pad
    GamepadHandler::getInstance().DealGPData(gLowState.wireless_remote);
    clock_gettime(CLOCK_MONOTONIC, &end1);
    // 2 axis
    // for (int i = 0; i < MAX_AXIS; i++) {
    //     gLowState.motor_state[i].mode = 1;
    //     gLowState.motor_state[i].q = 2;
    //     gLowState.motor_state[i].dq = 3;
    // }
    // 3 imux
    IMUHandler::getInstance().GetData(&gLowState.imu_state);
    clock_gettime(CLOCK_MONOTONIC, &end2);
    // publish
    rc = dds_write(gLowStatePub.writer, &gLowState);
    if (rc != DDS_RETCODE_OK)
    {
        DDS_FATAL("dds_write: %s\n", dds_strretcode(-rc));
        return -1;
    }
    // printf("%s: write successfully!\n", __FUNCTION__);
    clock_gettime(CLOCK_MONOTONIC, &end3);

    timer1 = (end1.tv_sec - begin.tv_sec) * 1000000 +
             (end1.tv_nsec - begin.tv_nsec) / 1000;
    timer2 = (end2.tv_sec - end1.tv_sec) * 1000000 +
             (end2.tv_nsec - end1.tv_nsec) / 1000;
    timer3 = (end3.tv_sec - end2.tv_sec) * 1000000 +
             (end3.tv_nsec - end2.tv_nsec) / 1000;

    if (timer1 > maxTimer1)
    {
        maxTimer1 = timer1;
    }
    if (timer2 > maxTimer2)
    {
        maxTimer2 = timer2;
    }
    if (timer3 > maxTimer3)
    {
        maxTimer3 = timer3;
    }
    static int i = 0;
    if (i++ % 50000 == 0)
    {
        printf("a part of lowstate, timer1=%ld, maxTimer1=%ld\n", timer1, maxTimer1);
        printf("a part of lowstate, timer2=%ld, maxTimer2=%ld\n", timer2, maxTimer2);
        printf("a part of lowstate, timer3=%ld, maxTimer3=%ld\n", timer3, maxTimer3);
    }
    return 0;
}

When tested on Ubuntu 20.04 with the rt patch, the CPU utilization was about 10%, and the execution time of dds_write and dds_take jumped between 40us and 700us, which was very unstable. I am using cyclonedds-0.10.5.

 cat /dobot/userdata/project/dds/cyclonedds.xml
<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
    <Domain Id="any">
        <General>
            <Interfaces>
                <NetworkInterface autodetermine="false" address="192.168.5.1" priority="default" multicast="false" />
            </Interfaces>
            <AllowMulticast>default</AllowMulticast>
            <MaxMessageSize>65500B</MaxMessageSize>
        </General>
        <Discovery>
            <EnableTopicDiscoveryEndpoints>true</EnableTopicDiscoveryEndpoints>
        </Discovery>
        <Internal>
            <Watermarks>
                <WhcHigh>500kB</WhcHigh>
            </Watermarks>
        </Internal>
    </Domain>
</CycloneDDS>
yanzhang920817 commented 1 day ago

@hansvanthag Hello, do you have any doubts or suggestions? I will do some related tests, thank you.

hansvanthag commented 1 day ago

As was stated (on discord support channel), there's multiple topics (of various sizes) being communicated along the 1Khz writes of 300 byte samples (for which the write and read execution times are being measured). So I have the following suggestions:

  1. can you retry the test without those other topics being communicated 'in parallel' ?
  2. I assume a network is involved and if so: what network (100mbps, 1gbps, ..) is being used (to rule out congestion)
  3. shown top-output didn't indicate any threads with high-core-cpu-usage which is puzzling to us
  4. if I'm right that you're using KEEP_LAST(1) writer-history and (supposedly) best-effort reliability that makes it even stranger

Therefore, the question to contribute a reproducer stands as we're pretty sure that on 'your' machine (2.4 Ghz i5) a 1Khz writer of 300 bytes shouldn't be that slow. Note that Jitter (on a non-realtime OS) can be caused by many things, so you might want to try to run the app at a RT-priority (nice --20) to see if that reduces the the write-time jitter .. FInally there's bundled performance-tests with Cyclone (pubsub/roundtrip) that you could try to see if those also behave strangely

yanzhang920817 commented 1 day ago

My operating system is a real-time system with the rt patch added. uname -a Linux dobot-IB-ITLU-TW01B 6.1.0-rt5 #1 SMP PREEMPT_RT Sun Oct 6 13:47:58 CST 2024 x86_64 x86_64 x86_64 GNU/Linux I used chrt to give my program the highest priority, and tested that it could optimize timing jitter. (Isolating cores 0 and 1 was done before, but it had no obvious effect) chrt -f 99 taskset -c 0,1 ./host

yanzhang920817 commented 1 day ago

As was stated (on discord support channel), there's multiple topics (of various sizes) being communicated along the 1Khz writes of 300 byte samples (for which the write and read execution times are being measured). So I have the following suggestions:

  1. can you retry the test without those other topics being communicated 'in parallel' ?
  2. I assume a network is involved and if so: what network (100mbps, 1gbps, ..) is being used (to rule out congestion)
  3. shown top-output didn't indicate any threads with high-core-cpu-usage which is puzzling to us
  4. if I'm right that you're using KEEP_LAST(1) writer-history and (supposedly) best-effort reliability that makes it even stranger

Therefore, the question to contribute a reproducer stands as we're pretty sure that on 'your' machine (2.4 Ghz i5) a 1Khz writer of 300 bytes shouldn't be that slow. Note that Jitter (on a non-realtime OS) can be caused by many things, so you might want to try to run the app at a RT-priority (nice --20) to see if that reduces the the write-time jitter .. FInally there's bundled performance-tests with Cyclone (pubsub/roundtrip) that you could try to see if those also behave strangely

1, I will retry the test; 2, I am communicating locally, and the network port set in XML is a Gigabit port. Even if it is local communication, if the network port set in XML is a 100M port, will it affect the execution time? 3,After executing chrt -f 99 taskset -c 0,1 ./host, the time jitter is within 100us; 4, I have turned on Turbo Boost, and all current test data are tested with Turbo Boost turned on.