SteveMacenski / slam_toolbox

Slam Toolbox for lifelong mapping and localization in potentially massive maps with ROS
GNU Lesser General Public License v2.1
1.6k stars 513 forks source link

Serialize/deserialize segfault #569

Closed CarloDnt closed 1 year ago

CarloDnt commented 1 year ago

Required Info:

Steps to reproduce issue

Call serialize posegraph after a mapping session

Expected behavior

Serialization of posegraph and data

Actual behavior

The node crashes giving -11 segfault during Serialization, the ram consumption is low I only see an high cpu load , I think is related to the jetson because on a Ubuntu pc it works fine

Additional information

I tried several configuration , enlarged the memory stack size and with different nodes sync and async

From the mapping session I can anyway get the occupancy grid can I use that in any way in order to continue mapping without deserialize a previous graph ?

SteveMacenski commented 1 year ago

Yes, you can use the save map service, but you won't be able to continue that mapping session or use SLAM Toolbox's localization with it. You'd need to use external localization systems that use occupancy grids.

In terms of the crash, you'd have to give a meaningful stack trace after compiling with debug flags. There's nothing I can do with a nonspecific report like this unfortunately.

CarloDnt commented 1 year ago

First of all thanks very much for the kind response, unfortunately I can’t use localisation algorithms like amcl because the environment changes a lot during time that’s why I was trying to continue mapping. Unfortunately I can’t get the package compiled , I crashes during compilation without useful insights but I will try again and come back with a trace. I know that without the trace is impossible to exactly understand what’s wrong I tryed to open the issue just to understand if anyone else is experiencing the same problem Anyway thanks again I will reopen when I get the trace If I run the node with gdb can be helpful?

SteveMacenski commented 1 year ago

Its probably lack of stack memory, that's the most common reason, or youre trying to serialize an empty file (I believe we fixed that on the ROS 2 and probably noetic branches but since Melodic is EOL I haven't been updating that).

If I run the node with gdb can be helpful?

Perhaps? Its hard to know because it may or may not provide us enough detail without having debugging flags.

Unfortunately I can’t get the package compiled

You should use a more powerful computer for this development, not the Jetson running on your robot. But if you're insistent on the jetson and like pulling teeth, you can increase your swap memory so that you can compile this. that might outright be the issue too. By default, ubuntu only partitions 2GB of swap which probably isn't enough

CarloDnt commented 1 year ago

Its probably lack of stack memory, that's the most common reason, or youre trying to serialize an empty file

Is something related to the posegraph dimensions because if I made the robot do only few moves the Serialization works but only a very few like a couple of meters , if I try to explore more it will not serialize anymore…. But I have changed the stack size parameter even to very large values nothing changes

CarloDnt commented 1 year ago

Said that I don’t wanna bother you all too much , if no one have experienced this we can close the point an I will come back with a more accurate trace

SteveMacenski commented 1 year ago

Is something related to the posegraph dimensions because if I made the robot do only few moves the Serialization works but only a very few like a couple of meters , if I try to explore more it will not serialize anymore

That's then you running out of memory, you need to allocate more stack memory to be able to serialize and deserialize files. That seems relatively clear.

CarloDnt commented 1 year ago

This can be done just from the configuration parameter right ?

SteveMacenski commented 1 year ago

Only if that memory exists

CarloDnt commented 1 year ago

I finally solved the problem, as Steve have mentioned the problem wasn’t related to slam toolbox but was a configuration limit of the stack size in the os. More specifically the ulimit -Hs was 1024 and the Serialization was exceeding that number of course, at first instance having 32 gb of ram i wasn’t sure of the lack of memory problem. This limit was set in the /etc/security/limits.conf, only the hard one has to be changed. I don’t know if all the jetson Xavier comes with this configuration or if my supplier putted that but in any case I will leave this comment if anyone fall in the same problem