RobotecAI / ros2cs

A C# (.Net) implementation of ros2 client library (rcl), enabling communication between ros2 ecosystem and C#/.Net applications such as Unity3D
Apache License 2.0
89 stars 22 forks source link

System.AccessViolationException when calling map_server service #55

Open jankolkmeier opened 1 year ago

jankolkmeier commented 1 year ago

The Issue

I'm trying to make a service call to the nav2 map_server, (calling /map_server/map with nav_msgs/srv/GetMap) to retrieve the current map. This call sometimes works, and sometimes crashes with the log below.

As far as I understand, there is an issue with allocating memory for the OccupancyGrid data field in the GetMap Response. The issue occurs more often/reliably when there is more data to allocate (i.e. when the maps are larger), and if calls are made with higher frequency.


I created an example client on this branch, including instructions and resources on how to reproduce the issue (i.e. how to start the map_server).

On a side note, I also noticed strange behavior when using .Call() instead of .CallAsync(), with .Call() never returning the response and just hanging (irrespectable of the map size) - not sure if/how this is related.


I've originally encountered this issue with Ros2ForUnity (where the issue causes the entire Editor to segfault), but found that it can also be reproduced in ros2cs.

I noticed that the issue reproduces faster when the maps are larger (512x512 immediately kills it on Ubuntu, while 32x32 can be called up to 200 times before crashing). I also noticed that it depends on the calling frequency (calling at 1 Hz seems to work more reliably with 32x32 than 10 Hz, where it crashes sooner). Finally, the issue is a little harder to reproduce on Windows 10, but reliably reproduces when using maps of size 512x512, even at 1 Hz.


I have tried many combinations of using overlay/standalone versions of ros2cs and Ros2ForUnity, both the binary releases from github and my own builds (from current develop branch), but the issue always occurs.

As mentioned - it occurs both on Ubuntu 22.04 and Windows 10 (with the caveat that on Windows, I've only tested inside Unity, not vanilla ros2cs). The windows system is a desktop PC with an AMD Ryzen 5950X and 64GB Memory. The Ubuntu system is a ThinkPad P15v with an Intel Core i7-12700H and 32GB Memory. Both running humble.

Next Steps

I will try to see if I can set up a unit test for this, as it seems to be related to allocating large int arrays (in my current test case, the data field on nav_msgs/OccupancyGrid). Please let me know if there is anything else I can try/do.

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Buffer.Memmove(Byte ByRef, Byte ByRef, UIntPtr)
   at System.Runtime.InteropServices.Marshal.CopyToManaged[[System.Char, System.Private.CoreLib, Version=, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](IntPtr, Char[], Int32, Int32)
   at nav_msgs.msg.OccupancyGrid.ReadNativeMessage(IntPtr)
   at nav_msgs.srv.GetMap_Response.ReadNativeMessage(IntPtr)
   at nav_msgs.srv.GetMap_Response.ReadNativeMessage()
   at ROS2.Client`2[[System.__Canon, System.Private.CoreLib, Version=, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].ProcessResponse(Int64, ROS2.Internal.MessageInternals)
   at ROS2.Client`2[[System.__Canon, System.Private.CoreLib, Version=, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].TakeMessage()
   at ROS2.Ros2cs+<>c.<SpinOnce>b__18_1(ROS2.IClientBase)
   at System.Collections.Generic.List`1[[System.__Canon, System.Private.CoreLib, Version=, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].ForEach(System.Action`1<System.__Canon>)
   at ROS2.Ros2cs.SpinOnce(System.Collections.Generic.List`1<ROS2.INode>, Double)
   at Examples.ROS2MMapServerClient.Main(System.String[])
/home/jan/ros2cs/install/lib/ros2cs_examples/ros2cs_mapserverclient: line 5: 65035 Aborted                 (core dumped) dotnet ${SCRIPTDIR}/dotnet/ros2cs_mapserverclient.dll
[ros2run]: Process exited with failure 134
Deric-W commented 1 year ago

Hi, the reason for Client.Call hanging is that it blocks until a response is received, which requires calling Ros2cs.SpinOnce, resulting in a deadlock. When using Client.CallAsync you can still call Ros2cs.SpinOnce while the response is pending, allowing it to be received.

From a quick glance at the generated source it seems that the error is caused by performing a Marshal.Copy between an int8[] in OccupancyGrid and an char[] in C#, which could be problematic since a char in C# is two bytes and therefore larger when using the same number of elements, resulting more memory being accessed.

Deric-W commented 1 year ago

If you want you can edit BASIC_IDL_TYPES_TO_MARSHAL_ARRAY in ./src/ros2cs/rosidl_generator_cs/rosidl_generator_cs/ and replace 'int8': 'char' with 'int8': 'byte' and check if that fixes the issue.

jankolkmeier commented 1 year ago

Thanks a lot for the pointers regarding the deadlock. Also thanks a lot for the pointers regarding the int8/char types!

I just tried the change you suggested (needed to do a clean rebuild to get it to compile happily), and the issue seems to be fixed. As a test, I ran the example at 100 Hz, with a map of size 2048x2048 10000 times - no crashes! All unit tests also still succeed for me.