Open gavanderhoorn opened 1 year ago
Summary of earlier findings/comments:
we could be handling message/service/action memory incorrectly. The micro-ROS documentation specifically states:
If the
init
function is called twice for the same message without callingfini
in between previously allocated memory will be leaked
There's *_create()
/*_destroy()
and micro_ros_utilities_create_message_memory(..)
/micro_ros_utilities_destroy_message_memory(..)
in micro_ros_utils
. Calling *_create()
and then *_init()
on sequence fields might also leak memory.
our libmicroros
might be missing some fixes upstream already has: we're using https://github.com/micro-ROS/rmw_microxrcedds/commit/a2a74846971697e4afc8a70b59c9a8352d74195b in our Humble builds, but according to the commit history, a couple commits later https://github.com/micro-ROS/rmw_microxrcedds/pull/280 gets merged (as https://github.com/micro-ROS/rmw_microxrcedds/commit/6df0003fe6d36fbcab77b5a663acba738e265890). From the description, it sounds like this could have something to do with the memory leak we're seeing.
Just as an update to this, I am running iron, which includes the previously mentioned PR (https://github.com/micro-ROS/rmw_microxrcedds/pull/280). But it seems to still have the same memory issues.
Here's a table showing the memory free and delta on my YRC1000 with Iron
# | Free | Delta |
---|---|---|
0 | 797368 | |
1 | 796848 | -520 |
2 | 796800 | -48 |
3 | 796712 | -88 |
4 | 796712 | 0 |
5 | 796752 | +40 |
6 | 796752 | 0 |
7 | 796800 | +48 |
8 | 796568 | -232 |
9 | 796448 | -120 |
10 | 796184 | -264 |
11 | 796072 | -112 |
12 | 796072 | 0 |
13 | 795976 | -96 |
14 | 795920 | -56 |
15 | 795928 | +8 |
16 | 796000 | +72 |
17 | 795888 | -112 |
18 | 795792 | -96 |
19 | 795744 | -48 |
It is still unclear to me where the problem is. But it seems like at the very least, the problem is not solved by the linked PR. I can try to track it down. But with how inconsistent it is, it might be tough to tell what exactly is going wrong.
On my very ugly branch, I have fixed seemingly all of the memory leaks from disconnect/reconnect cycles except for one. I believe that I have narrowed down the remaining leak to g_microRosNodeInfo.support
. When it gets initialized, it allocates 368 bytes, but when it is freed it only frees 344 bytes. I don't know what is happening to the other 24 bytes. But every single disconnect/reconnect cycle (except for the very first one, where some memory is permanently allocated), the total loss is exactly 24 bytes across the whole program, so I am almost certain that this is the only remaining problem. I don't know if this is a problem upstream or a problem with how we are using it.
My branch is a branch off of iron_wip
. There is no particular reason for that, I was just using iron at the time that I decided to try to fix this issue. The memory leak exists for both humble and iron, though.
If anybody has any ideas on how to fix this last leak, please let me know.
I created a new branch off of main and copied over the relevant fix from my fork and made a PR. This will make it more clear what changes I made, and decrease the impact of the memory leak for the time being. The other leak that I mentioned in my previous comment is still present, though.
I believe that I have found the second memory leak. The problem is seemingly this line in rcl/init.c
. Memory is allocated by strdup
, but that memory is never freed. I don't know whose responsibility it is to free that memory, but I added a single line that freed it to our rcl fork. I made a new branch with the change. Using libmicroros built with this change, the memory loss for every cycle other than the first one is 0.
I talked to Ted, and it seems to be an upstream problem, so I'm not suggesting that we merge the branch that I just made into our fork's main, but I wanted to share what I did to get rid of the leak. I think that I will create an issue upstream.
Yes, that is an issue upstream. Seeing as the same code is in ros2/rcl
, I would suggest posting there (if there isn't already an issue tracking this). Please post a comment here linking to it after you've opened it.
I would however suggest to check whether it's been perhaps already fixed in upstream's Humble, Jazzy or Rolling branches and we / micro-ROS is just lagging those. Just to avoid posting about an already resolved issue.
I'm not suggesting that we merge the branch that I just made
indeed, I don't believe that'd be necessary. The leak is very small and it doesn't affect functionality in MotoROS2. I would suggest we wait until upstream patches it and then we can update to those versions.
I created an issue here https://github.com/ros2/rcl/issues/1198
As mentioned in the Known issues & limitations section of the README (here), MotoROS2 currently appears to suffer from a small memory leak.
Testing shows the following values for used and free memory after several consecutive connect and disconnect cycles with a Galactic micro-ROS Agent:
As a table:
No changes to the config file between
rebootsdis-and-re-connects.It's unclear what causes this at this point.