Open jankolkmeier opened 1 year ago
Hi, the reason for Client.Call
hanging is that it blocks until a response is received, which requires calling Ros2cs.SpinOnce
, resulting in a deadlock.
When using Client.CallAsync
you can still call Ros2cs.SpinOnce
while the response is pending, allowing it to be received.
From a quick glance at the generated source it seems that the error is caused by performing a Marshal.Copy
between an int8[]
in OccupancyGrid
and an char[]
in C#, which could be problematic since a char
in C# is two bytes and therefore larger when using the same number of elements, resulting more memory being accessed.
If you want you can edit BASIC_IDL_TYPES_TO_MARSHAL_ARRAY
in ./src/ros2cs/rosidl_generator_cs/rosidl_generator_cs/generate_cs_impl.py
and replace 'int8': 'char'
with 'int8': 'byte'
and check if that fixes the issue.
Thanks a lot for the pointers regarding the deadlock. Also thanks a lot for the pointers regarding the int8/char types!
I just tried the change you suggested (needed to do a clean rebuild to get it to compile happily), and the issue seems to be fixed. As a test, I ran the example at 100 Hz, with a map of size 2048x2048 10000 times - no crashes! All unit tests also still succeed for me.
The Issue
I'm trying to make a service call to the nav2 map_server, (calling
/map_server/map
withnav_msgs/srv/GetMap
) to retrieve the current map. This call sometimes works, and sometimes crashes with the log below.As far as I understand, there is an issue with allocating memory for the OccupancyGrid data field in the GetMap Response. The issue occurs more often/reliably when there is more data to allocate (i.e. when the maps are larger), and if calls are made with higher frequency.
Reproducing
I created an example client on this branch, including instructions and resources on how to reproduce the issue (i.e. how to start the map_server).
On a side note, I also noticed strange behavior when using .Call() instead of .CallAsync(), with .Call() never returning the response and just hanging (irrespectable of the map size) - not sure if/how this is related.
Background
I've originally encountered this issue with Ros2ForUnity (where the issue causes the entire Editor to segfault), but found that it can also be reproduced in ros2cs.
I noticed that the issue reproduces faster when the maps are larger (512x512 immediately kills it on Ubuntu, while 32x32 can be called up to 200 times before crashing). I also noticed that it depends on the calling frequency (calling at 1 Hz seems to work more reliably with 32x32 than 10 Hz, where it crashes sooner). Finally, the issue is a little harder to reproduce on Windows 10, but reliably reproduces when using maps of size 512x512, even at 1 Hz.
System/Setup
I have tried many combinations of using overlay/standalone versions of ros2cs and Ros2ForUnity, both the binary releases from github and my own builds (from current develop branch), but the issue always occurs.
As mentioned - it occurs both on Ubuntu 22.04 and Windows 10 (with the caveat that on Windows, I've only tested inside Unity, not vanilla ros2cs). The windows system is a desktop PC with an AMD Ryzen 5950X and 64GB Memory. The Ubuntu system is a ThinkPad P15v with an Intel Core i7-12700H and 32GB Memory. Both running
humble
.Next Steps
I will try to see if I can set up a unit test for this, as it seems to be related to allocating large int arrays (in my current test case, the data field on
nav_msgs/OccupancyGrid
). Please let me know if there is anything else I can try/do.