Yaskawa-Global / motoros2

ROS 2 (rcl, rclc & micro-ROS) node for MotoPlus-compatible Yaskawa Motoman robot controllers
101 stars 21 forks source link

Possible memory leak (after dis-, then reconnect of Agent) #35

Open gavanderhoorn opened 1 year ago

gavanderhoorn commented 1 year ago

As mentioned in the Known issues & limitations section of the README (here), MotoROS2 currently appears to suffer from a small memory leak.

Testing shows the following values for used and free memory after several consecutive connect and disconnect cycles with a Galactic micro-ROS Agent:

Initialization complete. Memory available: (962992) bytes. Memory in use: (85584) bytes
Initialization complete. Memory available: (962992) bytes. Memory in use: (85584) bytes
Initialization complete. Memory available: (962552) bytes. Memory in use: (86024) bytes
Initialization complete. Memory available: (962280) bytes. Memory in use: (86296) bytes
Initialization complete. Memory available: (962280) bytes. Memory in use: (86296) bytes
Initialization complete. Memory available: (962280) bytes. Memory in use: (86296) bytes
Initialization complete. Memory available: (962256) bytes. Memory in use: (86320) bytes
Initialization complete. Memory available: (962224) bytes. Memory in use: (86352) bytes

As a table:

# Free Delta In use Delta
0 962992 85584
1 962992 0 85584 0
2 962552 -440 86024 440
3 962280 -272 86296 272
4 962280 0 86296 0
5 962280 0 86296 0
6 962256 -24 86320 24
7 962224 -32 86352 32

No changes to the config file between reboots dis-and-re-connects.

It's unclear what causes this at this point.

gavanderhoorn commented 1 year ago

Summary of earlier findings/comments:

jimmy-mcelwain commented 2 weeks ago

Just as an update to this, I am running iron, which includes the previously mentioned PR (https://github.com/micro-ROS/rmw_microxrcedds/pull/280). But it seems to still have the same memory issues.

Here's a table showing the memory free and delta on my YRC1000 with Iron

# Free Delta
0 797368
1 796848 -520
2 796800 -48
3 796712 -88
4 796712 0
5 796752 +40
6 796752 0
7 796800 +48
8 796568 -232
9 796448 -120
10 796184 -264
11 796072 -112
12 796072 0
13 795976 -96
14 795920 -56
15 795928 +8
16 796000 +72
17 795888 -112
18 795792 -96
19 795744 -48

It is still unclear to me where the problem is. But it seems like at the very least, the problem is not solved by the linked PR. I can try to track it down. But with how inconsistent it is, it might be tough to tell what exactly is going wrong.

jimmy-mcelwain commented 2 weeks ago

On my very ugly branch, I have fixed seemingly all of the memory leaks from disconnect/reconnect cycles except for one. I believe that I have narrowed down the remaining leak to g_microRosNodeInfo.support. When it gets initialized, it allocates 368 bytes, but when it is freed it only frees 344 bytes. I don't know what is happening to the other 24 bytes. But every single disconnect/reconnect cycle (except for the very first one, where some memory is permanently allocated), the total loss is exactly 24 bytes across the whole program, so I am almost certain that this is the only remaining problem. I don't know if this is a problem upstream or a problem with how we are using it.

My branch is a branch off of iron_wip. There is no particular reason for that, I was just using iron at the time that I decided to try to fix this issue. The memory leak exists for both humble and iron, though.

If anybody has any ideas on how to fix this last leak, please let me know.

jimmy-mcelwain commented 2 weeks ago

I created a new branch off of main and copied over the relevant fix from my fork and made a PR. This will make it more clear what changes I made, and decrease the impact of the memory leak for the time being. The other leak that I mentioned in my previous comment is still present, though.

jimmy-mcelwain commented 2 weeks ago

I believe that I have found the second memory leak. The problem is seemingly this line in rcl/init.c. Memory is allocated by strdup, but that memory is never freed. I don't know whose responsibility it is to free that memory, but I added a single line that freed it to our rcl fork. I made a new branch with the change. Using libmicroros built with this change, the memory loss for every cycle other than the first one is 0.

I talked to Ted, and it seems to be an upstream problem, so I'm not suggesting that we merge the branch that I just made into our fork's main, but I wanted to share what I did to get rid of the leak. I think that I will create an issue upstream.

gavanderhoorn commented 2 weeks ago

Yes, that is an issue upstream. Seeing as the same code is in ros2/rcl, I would suggest posting there (if there isn't already an issue tracking this). Please post a comment here linking to it after you've opened it.

I would however suggest to check whether it's been perhaps already fixed in upstream's Humble, Jazzy or Rolling branches and we / micro-ROS is just lagging those. Just to avoid posting about an already resolved issue.

I'm not suggesting that we merge the branch that I just made

indeed, I don't believe that'd be necessary. The leak is very small and it doesn't affect functionality in MotoROS2. I would suggest we wait until upstream patches it and then we can update to those versions.

jimmy-mcelwain commented 2 weeks ago

I created an issue here https://github.com/ros2/rcl/issues/1198