flexivrobotics / flexiv_rdk

RDK (robotic development kit) for Flexiv robots. Supports C++ and Python. Compatible with Linux, macOS, and Windows.
Apache License 2.0
58 stars 18 forks source link

[BUG] Conditional jump or move depends on uninitialised value(s) error #31

Closed acf986 closed 11 months ago

acf986 commented 11 months ago

Version information

Describe the bug Valgrind reports a Conditional jump or move depends on uninitialised value(s) error:

Thread 9: ==242056== Conditional jump or move depends on uninitialised value(s) ==242056== at 0x256158: fvr::SchedTask::periodicTask() (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x256F03: std::_Function_handler<FvrSt (), std::_Bind<FvrSt (fvr::SchedTask::(fvr::SchedTask))()> >::_M_invoke(std::_Any_data const&) (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x261697: fvr::PosixThread::threadEntryWrapper(void) (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x4964608: start_thread (pthread_create.c:477) ==242056== by 0x56A0132: clone (clone.S:95) ==242056== ==242056== Conditional jump or move depends on uninitialised value(s) ==242056== at 0x23C5BA: fvr::FvrTimeStatistics::setCurAndCalc(double) (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x2560B1: fvr::SchedTask::periodicTask() (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x256F03: std::_Function_handler<FvrSt (), std::_Bind<FvrSt (fvr::SchedTask::(fvr::SchedTask))()> >::_M_invoke(std::_Any_data const&) (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x261697: fvr::PosixThread::threadEntryWrapper(void) (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x4964608: start_thread (pthread_create.c:477) ==242056== by 0x56A0132: clone (clone.S:95) ==242056== ==242056== Conditional jump or move depends on uninitialised value(s) ==242056== at 0x23C5CE: fvr::FvrTimeStatistics::setCurAndCalc(double) (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x2560B1: fvr::SchedTask::periodicTask() (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x256F03: std::_Function_handler<FvrSt (), std::_Bind<FvrSt (fvr::SchedTask::(fvr::SchedTask))()> >::_M_invoke(std::_Any_data const&) (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x261697: fvr::PosixThread::threadEntryWrapper(void*) (in /home/sp/WorkSpace/moveit_ws/devel/lib/moveit_tutorials/moveit_cpp_direct_robot) ==242056== by 0x4964608: start_thread (pthread_create.c:477) ==242056== by 0x56A0132: clone (clone.S:95)

Steps to reproduce Run the RDK with valgrind. The error will be reported with a 25% chance.

Expected behavior Valgrind shall not report error.

Screenshots NA.

Additional context Please take a look at the report and if it is not a bug, let us know, we will suppress it in valgrind.

pzhu-flexiv commented 11 months ago

@acf986 Thanks for the issue report. It seems this Valgrind warning comes from flexiv::Scheduler. We have started an automated test using Valgrind 3.15.0 on flexiv_rdk/test/test_scheduler.cpp. The automated test was repeated for 100 times, during which we were not able to replicate a conditional jump.

Command used: sudo valgrind --track-origins=yes ./test_scheduler OS: Ubuntu 20.04-x86_64 Test beginning print: image Test summary print: image

acf986 commented 11 months ago

@pzhu-flexiv Hi, thanks very much for your answer. Have you tried to include something that takes a certain amount of time to compute? Like just throw in a big matrix inversion into the realtime loop. When the valgrind is on, my loop cost around 0.2s to complete. I understand that this violates the "realtime" but 1) this only happens when valgrind is on 2) we are not sending any command to the robot arm when valgrind is on.

pzhu-flexiv commented 11 months ago

@acf986

  1. What's the scheduler period you have for the periodic loop? Does everything finish within the specified period?
  2. Are you doing memory allocation for large data objects within the periodic loop every cycle? If so, have you tried creating the data objects only once outside the periodic loop and use them inside the periodic loop without repeated memory allocation?
acf986 commented 11 months ago

@pzhu-flexiv, I am running the scheduler for 1000 Hz. In normal running mode, the code finishes in around 100+us. However with valgrind, the code take 0.2s to finish. The code I am running is a quadratic programming based MPC that controls the robot arm through the joint torque interface.

I am not allocating large amount of data on the fly, everything is pre-allocated.

pzhu-flexiv commented 11 months ago

@acf986 In that case, I think the conditional jump might be caused by the 0.2s loop time with Valgrind enabled, the logic being the schedule will keep calling the callback function at 1kHz regardless of whether the previous run has finished or not. So with Valgrind enabled, your 0.2s loop at cycle k-1 might still be in the middle of some computation and copying data around in memory while cycle k is forcefully triggered by scheduler and doing the same operation on that same memory, causing a conflict that triggered the Valgrind warning.

acf986 commented 11 months ago

@pzhu-flexiv, thanks very much for your explanation. Is there any where I could find more details on the internal mechanism of the scheduler? Like how it mange the threads, memory, thread safety, etc.

pzhu-flexiv commented 11 months ago

@acf986 I'm afraid there's no detailed documentation about flexiv::Scheduler as of now. We've been using and improving it through the years with internal unit tests. The scheduler has been working reliably under many hard real-time scenarios.

acf986 commented 11 months ago

Thanks very much.