eProsima / Fast-DDS

The most complete DDS - Proven: Plenty of success cases. Looking for commercial support? Contact info@eprosima.com
https://eprosima.com
Apache License 2.0
2.12k stars 757 forks source link

FastDDS throughput on fast networks [13606] #1916

Closed ellerre closed 1 year ago

ellerre commented 3 years ago

Hello,

I am trying to benchmark FastDDS on fast Ethernet networks (1Gbps – 25Gbps – etc.). I chose to use the Throughput and Latency tests that you provide in the github repo (v2.3.0, commit bb500a5), but I have some problems in getting the expected throughput results for inter-host communication (latency is fine).

I've slightly modified the code of the test, since I want to have a single burst of exactly demand samples, sleep for recovery_time , and then go on with the next iteration (no timeout involved). In addition, I'm using RELIABLE reliability, KEEP_ALL history and VOLATILE durability.

I also want control on the network interface used for the test, and I want to use IP multicast for the test. Hence, I added the following lines to the publisher and the subscriber (I'm not using the xml file):

    pqos.transport().use_builtin_transports = false;
    auto udp_transport = std::make_shared<eprosima::fastdds::rtps::UDPv4TransportDescriptor>();
    udp_transport->interfaceWhiteList.push_back( "my-network-interface");
    pqos.transport().user_transports.push_back(udp_transport);

    eprosima::fastrtps::rtps::Locator_t locator;
    IPLocator::setIPv4(locator, 239, 255, 0, 1);
    locator.port = 22224;
    pqos.wire_protocol().default_multicast_locator_list.push_back(locator);

My testbed consists in two identical machines with:

  1. Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz 4 core (4 threads)
  2. Ubuntu 20.04.2 LTS (Linux kernel 5.8.0-44-generic x86_64)
  3. RTL8111/8168/8411 PCI Express 1 Gigabit Ethernet Controller and connected through a HP V1910-24G Switch.

I have increased the system-level UDP buffers as follows:

sudo sysctl -w net.core.rmem_default="65536"
sudo sysctl -w net.core.wmem_default="65536"
sudo sysctl -w net.core.rmem_max="2129920"
sudo sysctl -w net.core.wmem_max="2129920"
sudo sysctl -w net.ipv4.udp_mem="102400 873800 2129920"
sudo sysctl -w net.core.netdev_max_backlog="30000"
sudo sysctl -w net.ipv4.ipfrag_high_thresh="8388608"

and tested with iperf that multicast works correctly in my testbed (>950 Mbps raw bandwith on server side). However, I am getting very strange results. Sometimes it seems like some of the sent messages are not actually received, even though my QoS settings require that, and in those cases performance is very low (or I get strange huge numbers). Plus, in general, I've noted that bandwidth is really volatile.

[            TEST           ][                    PUBLISHER                      ][                            SUBSCRIBER                        ]
[ Bytes,Demand,Recovery Time][Sent Samples,Send Time(us),   Packs/sec,  MBits/sec][Rec Samples,Lost Samples,Rec Time(us),   Packs/sec,  MBits/sec]
[------,------,-------------][------------,-------------,------------,-----------][-----------,------------,------------,------------,-----------]
     32,100000,         2000,       100000,      2483535,   40265.189,     10.308,      100000,           0,     2483694,   40262.607,     10.307
     32,100000,         2000,       100000,      2769085,   36113.007,      9.245,      100000,           0,     2869735,   34846.425,      8.921
     32,100000,         2000,       100000,      3170701,   31538.769,      8.074,      100000,           0,     3201087,   31239.391,      7.997
     32,100000,         2000,       100000,      2726701,   36674.361,      9.389,      100000,           0,     2728071,   36655.941,      9.384
     64,100000,         2000,       100000,      3103221,   32224.582,     16.499,      100000,           0,     3110725,   32146.852,     16.459
     64,100000,         2000,       100000,      2619625,   38173.400,     19.545,      100000,           0,     2687510,   37209.159,     19.051
     64,100000,         2000,       100000,      2982428,   33529.724,     17.167,      100000,           0,     2982657,   33527.151,     17.166
     64,100000,         2000,       100000,      2489205,   40173.475,     20.569,      100000,           0,     2527807,   39559.976,     20.255
    128,100000,         2000,       100000,      3827213,   26128.673,     26.756,      100000,           0,     3827366,   26127.627,     26.755
    128,100000,         2000,       100000,      1962443,   50956.891,     52.180,       36924,           0,    11704459,    3154.695,      3.230
    128,100000,         2000,       100000,      1870241,   53469.038,     54.752,18446744073709467648,       90575,    26585720,693859101365203456.000,710511719797968.375
    128,100000,         2000,       100000,      2896076,   34529.479,     35.358,      100000,           0,     2922527,   34216.958,     35.038
    256,100000,         2000,       100000,      3264326,   30634.191,     62.739,      100000,           0,     3271808,   30564.145,     62.595
    256,100000,         2000,       100000,      2139676,   46736.037,     95.715,       60212,           0,     5868639,   10259.959,     21.012
    256,100000,         2000,       100000,      2131934,   46905.770,     96.063,18446744073709465600,       92989,    26172535,704813040945371904.000,1443457107856121.750
    256,100000,         2000,       100000,      2720196,   36762.057,     75.289,      100000,           0,     2722691,   36728.368,     75.220
    512,100000,         2000,       100000,      2495943,   40065.014,    164.106,      100000,           0,     2515864,   39747.782,    162.807
    512,100000,         2000,       100000,      1701011,   58788.575,    240.798,       17404,           0,    19714464,     882.804,      3.616
    512,100000,         2000,       100000,      1739940,   57473.252,    235.410,18446744073709463552,       89406,    29212738,631462355526461056.000,2586469808236384.500
    512,100000,         2000,       100000,      3079844,   32469.172,    132.994,       95144,           0,     3133215,   30366.250,    124.380
   1024,100000,         2000,       100000,      2760509,   36225.200,    296.757,       98870,           0,     2763209,   35780.860,    293.117
   1024,100000,         2000,       100000,      1446577,   69128.727,    566.303,18446744073709455360,       99014,    29590280,623405520629168384.000,5106938024994148.000
   1024,100000,         2000,       100000,      2161062,   46273.540,    379.073,       28368,           0,    15038730,    1886.330,     15.453
   1024,100000,         2000,       100000,      3310085,   30210.706,    247.486,      100000,           0,     3348699,   29862.340,    244.632
   2048,100000,         2000,       100000,      2876789,   34760.982,    569.524,      100000,           0,     2876866,   34760.044,    569.509
   2048,100000,         2000,       100000,      2863504,   34922.249,    572.166,      100000,           0,     2866040,   34891.349,    571.660
   2048,100000,         2000,       100000,      3370934,   29665.367,    486.037,      100000,           0,     3383160,   29558.167,    484.281
   2048,100000,         2000,       100000,      2913072,   34328.025,    562.430,       55908,           0,     7536325,    7418.470,    121.544
   4096,100000,         2000,       100000,      4355289,   22960.589,    752.373,      100000,           0,     4364965,   22909.693,    750.705
   4096,100000,         2000,       100000,      4333411,   23076.512,    756.171,       34295,           0,    14592362,    2350.202,     77.011
   4096,100000,         2000,       100000,      5090714,   19643.610,    643.682,18446744073709477888,       82532,    28019506,658353650055670784.000,21572932405024220.000
   4096,100000,         2000,       100000,      4378527,   22838.733,    748.380,       30735,           0,    15899871,    1933.035,     63.342
   8192, 50000,         2000,        50000,      3473945,   14392.859,    943.250,       49998,           0,     3474357,   14390.578,    943.101
   8192, 50000,         2000,        50000,      3761899,   13291.159,    871.049,       49998,           0,     3762276,   13289.298,    870.927
   8192, 50000,         2000,        50000,      3741743,   13362.758,    875.742,       49996,           0,     3742179,   13360.132,    875.570
   8192, 50000,         2000,        50000,      3643062,   13724.719,    899.463,       50000,           0,     3643798,   13721.946,    899.281
  16384, 50000,         2000,        50000,      7357181,    6796.081,    890.776,       49814,           0,     7434346,    6700.522,    878.251
  16384, 50000,         2000,        50000,      7469195,    6694.162,    877.417,18446744073709531136,       49987,     8478600,2175682830741217280.000,285171099990912832.000
  16384, 50000,         2000,        50000,      7543276,    6628.420,    868.800,18446744073709529088,       33037,    11238980,1641318327027040000.000,215130875760088160.000
  16384, 50000,         2000,        50000,      7363110,    6790.609,    890.059,       39600,       10400,     9929962,    3987.931,    522.706
  32768, 50000,         2000,        50000,     14135824,    3537.113,    927.233,       49998,           0,    14136279,    3536.857,    927.166
  32768, 50000,         2000,        50000,     14272752,    3503.179,    918.337,       37940,           0,    14609067,    2597.017,    680.792
  32768, 50000,         2000,        50000,     14942856,    3346.081,    877.155,        2297,       38004,    15183826,     151.279,     39.657
  32768, 50000,         2000,        50000,     13958897,    3581.945,    938.985,        9095,           0,    17949962,     506.686,    132.825
  64000, 50000,         2000,        50000,     27577674,    1813.061,    928.287,       38267,           0,    28096586,    1361.980,    697.334
  64000, 50000,         2000,        50000,     27298962,    1831.571,    937.765,18446744073709524992,       38603,    30549181,603837591096222976.000,309164846641266112.000
  64000, 50000,         2000,        50000,     27162068,    1840.802,    942.491,       22884,           0,    29688175,     770.812,    394.656
  64000, 50000,         2000,        50000,     27393268,    1825.266,    934.536,       20672,           0,    29477682,     701.276,    359.053

I have tried to use unicast instead of multicast. In that case, the behavior is much more stable, but performance is generally lower for big message sizes. Plus, I'm still getting those weird huge numbers every now and then. I've also tried to run the same test on much more modern and powerful machines with a 25Gbps network, but again I've obtained similar results.

Do you have any hint about what I’m doing wrong? I can provide additional information in case you need that. Thank you!

JLBuenoLopez commented 1 year ago

@ellerre,

Thanks for the report and sorry for the late reply. A few months ago, a look was taken to the throughput tests and some bugs were found. Some of them have been tackled, for instance #3309. Probably the weird results were related to this bug. I am going to proceed and close this issue but feel free to reopen if you get this weird results again.