ApexAI / performance_test

**This project is deprecated** Go to https://gitlab.com/ApexAI/performance_test
64 stars 41 forks source link

Opendds #100

Closed dejanpan closed 4 years ago

dejanpan commented 5 years ago

This change is Reviewable

daggarwa commented 5 years ago

@iourigordon I have rebased your branch but its currently failing in gitlab CI due to opendds not being installed: /bin/sh: 1: /opt/opendds/share/dds/bin/opendds_idl: not found with this error. I believe we will need to modify the .gitlab-ci.yml to include the installation for opendds

daggarwa commented 4 years ago

@iourigordon Any updates on this? Are you stuck somewhere?

iourigordon commented 4 years ago

Making progress.

topics.hpp is done, looking into resource_manager now.

If all goes well tonight I'll commit my changes

Regards, Iouri

On Tue, Oct 22, 2019, 8:39 AM Divya Aggarwal notifications@github.com wrote:

@iourigordon https://github.com/iourigordon Any updates on this? Are you stuck somewhere?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ApexAI/performance_test/pull/100?email_source=notifications&email_token=AAJQVDVQJGGHHZB5QNW6SYLQP4NC3A5CNFSM4JAI6YHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB6GPCA#issuecomment-545023880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQVDSP2V3WXGEI5CBSEZTQP4NC3ANCNFSM4JAI6YHA .

daggarwa commented 4 years ago

Making progress. topics.hpp is done, looking into resource_manager now. If all goes well tonight I'll commit my changes Regards, Iouri

@iourigordon Sounds great! Thanks so much for the update.

daggarwa commented 4 years ago

@iourigordon Any updates after that? Do you have any problems to get this through? Can I help with something?

iourigordon commented 4 years ago

I cleaned up CMake files and I think I zeroed on the issue last night. DDS_DYN_LIBS has no spaces in it, so all the libs are combined into one giant string.

I was going through CMake docs trying to figure out how to create a space separated string, but got nothing so far.

That's what I'll be looking at tonight.

Do you now how to do it in CMake?

On Thu, Oct 24, 2019, 11:46 AM Divya Aggarwal notifications@github.com wrote:

@iourigordon https://github.com/iourigordon Any updates after that? Do you have any problems to get this through? Can I help with something?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ApexAI/performance_test/pull/100?email_source=notifications&email_token=AAJQVDVJXP6L4Y6MA74IIZTQQHUOZA5CNFSM4JAI6YHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECGBLTA#issuecomment-546051532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQVDXSXOGR7IOANQILVRLQQHUOZANCNFSM4JAI6YHA .

iourigordon commented 4 years ago

@daggarwa @dejanpan @kylemarcey did you try to build with the latest changes? Any thoughts on OpenDDS installation?

daggarwa commented 4 years ago

@daggarwa @dejanpan @kylemarcey did you try to build with the latest changes? Any thoughts on OpenDDS installation?

Hi Iouri

I will be having a look at this today. Will update you after that. Thanks

daggarwa commented 4 years ago

@iourigordon I get the following error when I set the reliability setting to "reliable" for an experiment:

divya.aggarwal@ade:~/perf_test_ws$ ros2 run performance_test perf_test -c OpenDDS -l log -t Array1k --max_runtime 30 --history_depth 100 --reliable -r 100
Experiment id: 2fe17569-f0a0-41c8-b438-618b138c2125
Performance Test Version: e7d9605
Logfile name: log_Array1k_25-10-2019_17-43-28
Communication mean: OpenDDS
RMW Implementation: rmw_apex_dds
DDS domain id: 0
QOS: Reliability: RELIABLE Durability: VOLATILE History kind: KEEP_ALL History depth: 100 Sync. pub/sub: 0
Publishing rate: 100
Topic name: Array1k
Maximum runtime (sec): 30
Number of publishers: 1
Number of subscribers:1
Memory check enabled: 0
Use ros SHM: 0
Use single participant: 0
Not using waitset: 0
Not using Connext DDS Micro INTRA: 0
With security: 0
Roundtrip Mode: NONE
---EXPERIMENT-START---
T_experiment,   T_loop, received,   sent,   lost,   relative_loss,  data_received,  latency_min (ms),   latency_max (ms),   latency_mean (ms),  latency_variance (ms),  pub_loop_res_min (ms),  pub_loop_res_max (ms),  pub_loop_res_mean (ms), pub_loop_res_variance (ms), sub_loop_res_min (ms),  sub_loop_res_max (ms),  sub_loop_res_mean (ms), sub_loop_res_variance (ms), ru_utime,   ru_stime,   ru_maxrss,  ru_ixrss,   ru_idrss,   ru_isrss,   ru_minflt,  ru_majflt,  ru_nswap,   ru_inblock, ru_oublock, ru_msgsnd,  ru_msgrcv,  ru_nsignals,    ru_nvcsw,   ru_nivcsw,  cpu_usage (%),  
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to write to sample
daggarwa commented 4 years ago

@iourigordon I get the following error when I set the reliability setting to "reliable" for an experiment:

divya.aggarwal@ade:~/perf_test_ws$ ros2 run performance_test perf_test -c OpenDDS -l log -t Array1k --max_runtime 30 --history_depth 100 --reliable -r 100
Experiment id: 2fe17569-f0a0-41c8-b438-618b138c2125
Performance Test Version: e7d9605
Logfile name: log_Array1k_25-10-2019_17-43-28
Communication mean: OpenDDS
RMW Implementation: rmw_apex_dds
DDS domain id: 0
QOS: Reliability: RELIABLE Durability: VOLATILE History kind: KEEP_ALL History depth: 100 Sync. pub/sub: 0
Publishing rate: 100
Topic name: Array1k
Maximum runtime (sec): 30
Number of publishers: 1
Number of subscribers:1
Memory check enabled: 0
Use ros SHM: 0
Use single participant: 0
Not using waitset: 0
Not using Connext DDS Micro INTRA: 0
With security: 0
Roundtrip Mode: NONE
---EXPERIMENT-START---
T_experiment, T_loop, received,   sent,   lost,   relative_loss,  data_received,  latency_min (ms),   latency_max (ms),   latency_mean (ms),  latency_variance (ms),  pub_loop_res_min (ms),  pub_loop_res_max (ms),  pub_loop_res_mean (ms), pub_loop_res_variance (ms), sub_loop_res_min (ms),  sub_loop_res_max (ms),  sub_loop_res_mean (ms), sub_loop_res_variance (ms), ru_utime,   ru_stime,   ru_maxrss,  ru_ixrss,   ru_idrss,   ru_isrss,   ru_minflt,  ru_majflt,  ru_nswap,   ru_inblock, ru_oublock, ru_msgsnd,  ru_msgrcv,  ru_nsignals,    ru_nvcsw,   ru_nivcsw,  cpu_usage (%),  
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to write to sample

@iourigordon Any updates on this yet?

iourigordon commented 4 years ago

That skipped under my radar, I looked into adding more topics. Do you want me to add more topics or look into this?

daggarwa commented 4 years ago

That skipped under my radar, I looked into adding more topics. Do you want me to add more topics or look into this?

I guess we can look into this first. Adding topics isn't too complicated right?

iourigordon commented 4 years ago

Not at all, it just took me some time to go through the code and find out all the places that need to be modified. Adding Array16k_ took me 10 mins including compiling and running. I'll look into reliability issue tonight

iourigordon commented 4 years ago

@daggarwa

I committed my change for reliable qos. I was not setting blocking time on both data reader and writer.

I'll attach my log in the e-mail

daggarwa commented 4 years ago

@daggarwa

I committed my change for reliable qos. I was not setting blocking time on both data reader and writer.

I'll attach my log in the e-mail

Thanks . Couple of follow up issues for reliable QoS :

  1. I tested the configuration with Reliability QoS setting reliable and History QoS setting keep_last and it threw error during experiment run and terminated the performance test process:

    divya.aggarwal@ade:~/perf_test_ws$ ros2 run performance_test perf_test -c OpenDDS -l log -t Array1k --max_runtime 30 --history_depth 100 --reliable -r 100 --keep_last
    RMW Implementation: rmw_apex_dds
    Experiment id: 8b1cdd17-90f3-4c7e-b546-f42fcf26d73a
    Performance Test Version: 558a4b5
    Logfile name: log_Array1k_29-10-2019_11-23-34
    Communication mean: OpenDDS
    RMW Implementation: rmw_apex_dds
    DDS domain id: 0
    QOS: Reliability: RELIABLE Durability: VOLATILE History kind: KEEP_LAST History depth: 100 Sync. pub/sub: 0
    Publishing rate: 100
    Topic name: Array1k
    Maximum runtime (sec): 30
    Number of publishers: 1
    Number of subscribers:1
    Memory check enabled: 0
    Use ros SHM: 0
    Use single participant: 0
    Not using waitset: 0
    Not using Connext DDS Micro INTRA: 0
    With security: 0
    Roundtrip Mode: NONE
    ---EXPERIMENT-START---
    T_experiment,   T_loop, received,   sent,   lost,   relative_loss,  data_received,  latency_min (ms),   latency_max (ms),   latency_mean (ms),  latency_variance (ms),  pub_loop_res_min (ms),  pub_loop_res_max (ms),  pub_loop_res_mean (ms), pub_loop_res_variance (ms), sub_loop_res_min (ms),  sub_loop_res_max (ms),  sub_loop_res_mean (ms), sub_loop_res_variance (ms), ru_utime,   ru_stime,   ru_maxrss,  ru_ixrss,   ru_idrss,   ru_isrss,   ru_minflt,  ru_majflt,  ru_nswap,   ru_inblock, ru_oublock, ru_msgsnd,  ru_msgrcv,  ru_nsignals,    ru_nvcsw,   ru_nivcsw,  cpu_usage (%),  
    (661|673) ERROR: PublisherImpl::create_datawriter, inconsistent qos.
    (661|674) ERROR: SubscriberImpl::create_datareader, inconsistent qos.
    terminate called recursively
    terminate called after throwing an instance of 'std::runtime_error'
    what():  Could not create datawriter
  2. I tested the configuration with Reliability QoS setting reliable and Durability QoS setting transient_local and it threw error at experiment start and terminated the performance test process:

divya.aggarwal@ade:~/perf_test_ws$ ros2 run performance_test perf_test -c OpenDDS -l log -t Array1k --max_runtime 30 --history_depth 100 --reliable -r 100 --transient
RMW Implementation: rmw_apex_dds
Experiment id: 51741745-aa76-42bb-b488-67bcfea75f34
Performance Test Version: 558a4b5
Logfile name: log_Array1k_29-10-2019_11-24-59
Communication mean: OpenDDS
RMW Implementation: rmw_apex_dds
DDS domain id: 0
QOS: Reliability: RELIABLE Durability: TRANSIENT_LOCAL History kind: KEEP_ALL History depth: 100 Sync. pub/sub: 0
Publishing rate: 100
Topic name: Array1k
Maximum runtime (sec): 30
Number of publishers: 1
Number of subscribers:1
Memory check enabled: 0
Use ros SHM: 0
Use single participant: 0
Not using waitset: 0
Not using Connext DDS Micro INTRA: 0
With security: 0
Roundtrip Mode: NONE
---EXPERIMENT-START---
T_experiment,   T_loop, received,   sent,   lost,   relative_loss,  data_received,  latency_min (ms),   latency_max (ms),   latency_mean (ms),latency_variance (ms),    pub_loop_res_min (ms),  pub_loop_res_max (ms),  pub_loop_res_mean (ms), pub_loop_res_variance (ms), sub_loop_res_min (ms),  sub_loop_res_max (ms),  sub_loop_res_mean (ms), sub_loop_res_variance (ms), ru_utime,   ru_stime,   ru_maxrss,  ru_ixrss,   ru_idrss,   ru_isrss,   ru_minflt,  ru_majflt,  ru_nswap,   ru_inblock, ru_oublock, ru_msgsnd,  ru_msgrcv,  ru_nsignals,    ru_nvcsw,   ru_nivcsw,  cpu_usage (%),  
1.000113,       1.000110,       31,     32,     0,      0.00,       33273,      0.6141,     2.414,      0.9611,     7.614e-05,      7.685,      9.644,      9.382,      9.653e-05,      -0.3794,        7.592,      0.1159,     0.001919,       0.06837,        0.05741,        64868,      0,      0,      0,      11327,      0,      0,      0,      8,      0,      0,      0,      460,        3,      2.642e-05,      
2.000896,       1.000127,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.0759,     0.06983,        64868,      0,      0,      0,      11328,      0,      0,      0,8,        0,      0,      0,      677,        3,      0.1668,     
3.001523,       1.000127,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.08032,        0.08299,        64868,      0,      0,      0,      11328,      0,      0,      0,8,        0,      0,      0,      890,        3,      0.2498,     
4.002120,       1.000084,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.08812,        0.09316,        64868,      0,      0,      0,      11328,      0,      0,      0,8,        0,      0,      0,      1105,       3,      0.08368,        
5.002782,       1.000159,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.09018,        0.1031,     64868,      0,      0,      0,      11328,      0,      0,      0,      8,      0,      0,      0,      1318,       4,      0.1663,     
6.003032,       1.000126,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.102,      0.1091,     64868,      0,      0,      0,      11328,      0,      0,      0,      8,      0,      0,      0,      1531,       4,      0.08368,        
7.003761,       1.000222,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.11,       0.1191,     64868,      0,      0,      0,      11328,      0,      0,      0,      8,      0,      0,      0,      1743,       4,      0.08347,        
8.004459,       1.000086,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.1126,     0.131,      64868,      0,      0,      0,      11328,      0,      0,      0,      8,      0,      0,      0,      1957,       4,      0.2502,     
9.005151,       1.000132,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.1178,     0.1432,     64868,      0,      0,      0,      11328,      0,      0,      0,      8,      0,      0,      0,      2171,       4,      0.08368,        
10.005910,      1.000256,       0,      0,      0,      -nan,       0,      inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       inf,        -inf,       0,      -nan,       0.1243,     0.1519,     64868,      0,      0,      0,      11328,      0,      0,      0,      8,      0,      0,      0,      2386,       4,      0.1675,     
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to write to sample
iourigordon commented 4 years ago

I'll look into these tonight

daggarwa commented 4 years ago

I'll look into these tonight

Okay sounds good. Just to make sure the QoS configurations we need to make sure are working are:

  1. If you don't specify Reliability: --reliable or Durability: --transient it will take default to be Reliability: best_effort and Durability: volatile
  2. --transient
  3. --reliable
  4. --reliable --transient

We also need to make sure that all of the above work with --keep_last history QoS setting

iourigordon commented 4 years ago

I did a bit of research: transient works fine alone, but --reliable --transient fails the same way as you saw. There is on more Qos called durability_service, that sets cache parameters for transient mode. It's similar to resource_limits but separate. i tried setting it up, but still no luck. i'll start instrumenting code for more debug messages tomorrow.

daggarwa commented 4 years ago

I did a bit of research: transient works fine alone, but --reliable --transient fails the same way as you saw. There is on more Qos called durability_service, that sets cache parameters for transient mode. It's similar to resource_limits but separate. i tried setting it up, but still no luck. i'll start instrumenting code for more debug messages tomorrow.

Oh okay sounds good.

daggarwa commented 4 years ago

@iourigordon Any updates here?

iourigordon commented 4 years ago

Only bad ones. What happens is data writer cache gets full and samples are not cleared out. Data reader seems to recieve them. I found opendds tests that work with reliable an local transient qos. That's the plan for tonight

iourigordon commented 4 years ago

Oh, and one question. MicroDDS sets resource limits to 32 samples, while cyclonedds does not. Do you know the logic behind this?

daggarwa commented 4 years ago

Only bad ones. What happens is data writer cache gets full and samples are not cleared out. Data reader seems to recieve them. I found opendds tests that work with reliable an local transient qos. That's the plan for tonight

@iourigordon Sorry so to clarify on that the observation is that with OpenDDS, for reliable and transient local the samples from data reader are not getting cleared out. Can you elaborate more on I found opendds tests that work with reliable an local transient qos ? I am not sure if I understand this.

iourigordon commented 4 years ago

I've instrumented the code so DCPS debug messages are printed to console. From here you can see that writer fails to push messages into cache.

OpenDDS source tree come with a lot tests to verify functionality. That's where I found tests for reliable and transient qos. I'll see what is different between the tests and perf_ test.

I've tried setting up a lot of parameters in DURABILITY and DURABILITY_SERVICE qos yesterday, and so far it did not work. I am not sure at this point if the problem is with DURABILTY/DURABILITY_SERVICE. It might be with resource limits as well.

jpsamper2009 commented 4 years ago

Continuing in Gitlab https://gitlab.com/ApexAI/performance_test/merge_requests/100. Closing this one