LouKordos / walking_controller

The main walking controller code for the Bipedal Robot.
14 stars 3 forks source link

Code block takes unusally long time at random points in simulation and causes the robot to break #80

Closed LouKordos closed 3 years ago

LouKordos commented 3 years ago

After running for anywhere between 5-60 minutes, the robot suddenly falls/breaks/explodes in different ways. The most common ones are rapidly moving the leg upwards, foot orientation going crazy, plain falling onto the ground as if no forces were required and incorrect foot placement, i.e. not tracking r_desired (errors of 3-4cm compared to the normal 0.5-1cm). Considering the controller works for extended periods of time and the plots don't show any signs of instability before that, while there is always an execution time spike before the breakage, it is likely that some kind of (dead)lock causes a delay and thus outdated torques being applied for too long, or that any part of the code takes unusually long and thus causes the same delay. Spike in full iteration time while actual solver time is low, indicating threading / lock / mutex issue: image Large error in desired vs actual foot placement, causing the forces to be applied at an unexpected position, most likely also caused by a lag somewhere: image Another spike in full MPC iteration time while solver time is much lower: image

LouKordos commented 3 years ago

After adding high resolution clocks to measure execution time of each code block, it seems that the logging code block experiences sudden spikes from ~200uS on average to 250-800ms. As this is repeatable across multiple runs (even though it occasionally occurs on another thread because multiple threads are logging at the same time), the solution would be to move the logging to a separate dedicated thread. This thread would then continually log the elements in the queue, on which the other threads would push strings.

Spikes in recorded logging time: image image

LouKordos commented 3 years ago

I narrowed this down further to the specific LOC where the data gets written to the file, indicating IO lag spikes are the cause.

LouKordos commented 3 years ago

84 fixes this, closing.