Cartographer grpc crashes when requesting submaps

uahic commented 2 years ago

I've tried to evaluate multi-trajectory SLAM via gRPC whereas one robot served as 'cloud server' as an uplink for another one. Opening RVIZ and the map display runs cartographer-rviz submaps plugin, which lets the grpc_client send a proto to request a Submap from the server. This works on the robot which does NOT act as a cloud server but crashes (without further stack trace) on the robot with cloud server.

Currently, I'm scanning the code base and trying to understand more details but up to the point where it touches the pose_graph it looks alright for me.

Potential differences when running a grpc_server as uplink "master" (aka cloud server) to the 'slave' servers are:

Slaves do run a LocalTrajectoryUploader
The master grpc_server does not have range scan data in its submaps as this gets filtered out by the slaves before streaming submaps to the master

Other issues I could imagine are resource conflicts when accessing internal datastructures or missing lockings because of threading?

Versions Cartographer is on the current master branch's head grpc is on v1.10.0
async grpc is on commit 74cbcb37a6713814a1fc928eacbd2e7e3ffb1289 https://github.com/cartographer-project/async_grpc/commit/74cbcb37a6713814a1fc928eacbd2e7e3ffb1289

@MichaelGrupp is grpc + using rviz working for you fine? another question is if global slam optimizations are fed back from the upstream 'master' to its slaves. I cant find that in the code but I might have simply overlooked it. Thank you very much

tristan-schwoerer commented 2 years ago

Hey, i can't answer you question but i am struggling with the same currently. I am hosting a server on one machine and register two robots/trajectories on it. I believe, that there is simply no feedback from the global optimization on the server back to the robots. I found this video from roscon 2018 https://vimeo.com/293260413 where @MichaelGrupp introduced the cloud computing and if I understood them correctly they feedback the remote result by exporting the pbstream on the server using the write_state service, streaming it to the robots manually and then localize in it. It sounds like this is done in intervals and not really automated by cartographer.

uahic commented 2 years ago

@Tristan9497 I dived into the grpc code and there is indeed no feedback of global optimization. The transport isn't hard to do, in fact it seems all necessary handler and protobuf messages are in place (for submaps, posegraph) and you have the corresponding methods to get all of these using the posegraph class. However, there are no methods to alternate the states of the posegraph nodes - as far as I did see last week. Now, as the pose graph is running in a seperate thread and I dont know how all the datastructures are linked I stepped back for the moment to just insert new methods to directly manipulate the internals. If you are interested we could collaborate on this issue

There is another fork https://github.com/shreyasgokhale/cartographer_ros (you need his fork with just cartographer as well) which allows you to send another grpc server instance the pbstream file. However, it contains as far as I can tell ALL internal data and everytime you do that you would have to relocate yourself in existing trajectories. The robots would not be really in the same coordinate system and transmission time grows with the size of the recorded map

tristan-schwoerer commented 2 years ago

Hey @uahic, sure working on this together sounds nice.

Your Idea sounds indeed very demanding for the network, although i am convinced it would work really well only thing i am concerned is the localizing step.

I was thinking it might be possible to approximate/calculate the current position of the robots by comparing their submaps, meaning the ones of the local slam and then the global optimized ones. We have very easy access to those, since they are on the ROS network anyway. In the end its just a little less data then your recommendation and of course will grow over time too.

Using the first optimized submap location of both trajectories we could fairly easily determine the starting positions of the trajectories to each other.

Then we could jump to the latest optimized submap compare it to the first one and get the position of it in the "real world". Which will get us already very close depending on submap size.

The remaining part of the trajectory is a not yet finished submap which would be the trasformation between the latest not optimized submap and the robot. This should be fairly easy to get since this is just the pose of local slam compared to the latest non optimized submap position.

Adding all of that up should allow to publish map->odom transforms that are as good as possible without constantly localizing.

It sounds a little hacky though Let me know if that makes any sense to you

uahic commented 2 years ago

@Tristan9497 I was more referring to implementing the missing functions of the posegraph class to (simply?) change the pose of an already existing node.

A client sends only finished submaps of its trajectories to the 'master' (uplink server) and thus the uplink server. After optimization steps the poses of multiple nodes of each trajectory has changed and should communicated back (that part is easy). On client side 'only' the modification methods are missing.

Method 1) The user has to set global optimization to disabled for the clients (that is possible already) and gets the posegraph constantly modified (reflected) by that downlink stream. So basically the work here would be to understand the posegraph and optimizer classes and allow for modifications via new API methods

Method 2) In presence of an uplink server we swap the posegraph class instantiation on the client all together with the posegraph_stub class. This might cause a larger network load as you get send data from all trajectories back and right now I dont know how much work is left in that stub class

tristan-schwoerer commented 2 years ago

Oh i see now, what you are trying and think this is definetely the right way to go. To be honest i think i need to spend some time with the cartographer pose_graph stuff to actually know what is going on in there. I think this would be an amazing feature to have.

If i have time on the weekend i will investigate a little.

adiego73 commented 2 years ago

@Tristan9497 @uahic I am currently facing the same problem (not the crash, but the sync between master and slave), have you started to work on this issue? I am willing to contribute and work with you on this, however, I am not using ROS just cartographer in a standalone manner.

bufeng-12 commented 2 years ago

您好！我已收到你的邮件！谢谢！ ——曾君

uahic commented 2 years ago

@adiego73 Hi Diego, not yet. The reason is that currently there is no demand in my institution (it may come back soon though!) which doesnt allow me to work on this issue at least during official work times; Still, I'd consider working on it at a very slow pace from my side. ROS doesnt matter so much for this issue (its basically just an additional wrapper package) and as far as I can see the methods to exchange data with GRPC do already exist for virtually all interesting datastructures.

I studied the RFCs of the former Cartographer developer group and they mentioned the desire to implement all of this but postponed it with some comments about that they have to decide how to implement it which sounds likes: it may not as trivial as it looks like (potentially!). I cant judge (yet) the internals of the classes which asynchronously run the optimization loops but it all seems really to boil down to "how to insert data on the fly without messing up the optimization and constraints". I cant remember all the details from before christmas when the knowledge was 'fresh' but I think studying this classes (posegraph optimization or something like that or just posegraph?) is really necessary;

The first step may be to draw some rough diagrams of the overall architecture and interaction between classes; I'm right now very busy but I will come back at this issue

adiego73 commented 2 years ago

Great, I will start looking at those classes first to understand well how all this works Thanks!

tristan-schwoerer commented 2 years ago

@adiego73 Hey sorry, i was not able to work on this and did not get any further since i was too busy with the project i was working on at that time.

cartographer-project / cartographer

Cartographer grpc crashes when requesting submaps #1867