MIT-SPARK / Kimera-Multi

Index repo for Kimera-Multi system
317 stars 34 forks source link

RViz Unexpected Output #7

Closed tgodfrey0 closed 9 months ago

tgodfrey0 commented 11 months ago

Hi, thanks for the great work. I am having some issues with RViz. I have been running it with two robots (acl_jackal and acl_jackal2) but the default RViz configuration shows no output. In order to see any RViz output I have to change the Fixed Frame from world to /acl_jackal2/map and the topics for Path0 and Path1 to /acl_jackal/kimera_vio_ros/optimized_trajectory and /acl_jackal2/kimera_vio_ros/optimized_trajectory respectively.

After doing these config changes, the path of the first robot appears on the RViz but at a random point (I have checked several times and it seems the issue appears at different times during playback) I get the following output. cropped

Do you know what may be causing the issue with the graph and no output being shown with the default RViz config? The library builds with no issues.

Thanks in advance!

yuluntian commented 11 months ago

Hi, thank you for your interest in this work, and sorry that you ran into this issue. The topics you were visualizing (e.g., /acl_jackal/kimera_vio_ros/optimized_trajectory) correspond to the single-robot VIO trajectory estimates, and in general, the VIO trajectories from different robots will not be aligned in the same reference frame.

In contrast, the default path topics in the rviz config correspond to trajectory estimates from distributed pose graph optimization (PGO), and will be aligned in the same frame when the system is working correctly. In this case, the PGO trajectories will only show up toward the end of the dataset, because the two robots acl_jackal and acl_jackal2 only have inter-robot loop closures at the end. This might be the reason for not seeing any outputs initially. Just to check, have you played back the datasets until the very end? If you still see this issue at that point, feel free to share the logs folder here (see issue #6 for a related discussion) and we can try to take a look from there.

tgodfrey0 commented 11 months ago

Hi, thanks for the response. I ran it several times until both playbacks displayed done but I still got no output. The DPGO pane shows 0 inter-robot loop closures. I've attached the logs. Thanks two_robot_log.zip

yuluntian commented 11 months ago

Thanks for the information. I took a quick look, it seems that the VIO stops after about 140 sec for acl_jackal and 250 sec for acl_jackal2 (these can be seen from the logs odometry_poses.csv). The most informative action is to check that the VIO have not crashed for both robots (monitor the vio window in tmux) when playing the datasets.

Also, it might be useful to decrease the rosbag playback rate for debugging. This can be done by changing the RATE arg in 1014_example.yaml, e.g., RATE=0.1.

tgodfrey0 commented 11 months ago

Thanks for the response. I had a look at the odometry_poses.csv log but I couldn't find the error in there. The first error that appears in the vio pane is this:

E0907 12:18:54.247426 19727 VioBackend.cpp:1413]                                                          
Indeterminant linear system detected while working near variable                                          
8646911284551353073 (Symbol: x753).                                                                       

Thrown when a linear system is ill-posed.  The most common cause for this                                 
error is having underconstrained variables.  Mathematically, the system is                                
underdetermined.  See the GTSAM Doxygen documentation at                                                  
http://borg.cc.gatech.edu/ on gtsam::IndeterminantLinearSystemException for                               
more information.                                                                                         
E0907 12:19:06.232441 19727 VioBackend.cpp:1416] ERROR: Variable has type 'x' and index 753               
E0907 12:19:06.232614 19727 VioBackend.cpp:1444] Adding prior on key: x753                                
E0907 12:19:06.232659 19727 VioBackend.cpp:1444] Adding prior on key: x736                                
E0907 12:19:06.232671 19727 VioBackend.cpp:1444] Adding prior on key: b753                                
E0907 12:19:06.232697 19727 VioBackend.cpp:1444] Adding prior on key: b736                                
E0907 12:19:06.232708 19727 VioBackend.cpp:1444] Adding prior on key: v753                                
E0907 12:19:06.232751 19727 VioBackend.cpp:1444] Adding prior on key: v736                                
E0907 12:19:06.232789 19727 VioBackend.cpp:1493] Attempting to update smoother with added prior factors   

Is this error the cause of my issues? Do you have any suggestions to fix this? Thanks

tgodfrey0 commented 11 months ago

Hi, I ran it again an in the vio pane I got the following error:

E0908 20:58:10.010210  7278 VioBackend.cpp:1499] Smoother recovery failed. Most likely, the additional pri
or factors were insufficient to keep the system from becoming indeterminant.
E0908 20:58:10.011487  7278 VioBackendModule.cpp:34] Backend did not return an output: shutting down Backe
nd.

After this, I got an error in the frontend pane and in the dpgo pane. The error in the frontend pane was

[acl_jackal/distributed_loop_closure/distributed_loop_closure_node-1] process has died [pid 7070, exit code -11, cmd /home/tg/projects/p3p/ws_kimera_multi/devel/lib/kimera_distributed/kimera_distributed_loop_closure_node __name:=distributed_loop_closure_node __log:=/home/tg/.ros/log/f1e06e58-4eaa-11ee-aea8-9f1ad1f0a8c5/acl_jackal-distributed_loop_closure-distributed_loop_closure_node-1.log].
log file: /home/tg/.ros/log/f1e06e58-4eaa-11ee-aea8-9f1ad1f0a8c5/acl_jackal-distributed_loop_closure-distributed_loop_closure_node-1*.log

The error in the dpgo pane was

[ERROR] [1694221731.743118013]: ROS service /acl_jackal/distributed_loop_closure/request_pose_graph does not exist!

Does the vio error kill the process? Do you have any idea what may be causing these issues?

Also, does this output look typical for the dpgo pane? Thanks dpgo

yuluntian commented 11 months ago

Hi, sorry for the delay in response. The crash in the vio pane should be the root cause -- given the crash, the behavior in the dpgo pane is expected.

yuluntian commented 11 months ago

@tgodfrey0 A quick ping about this issue - let us know if you have any updates!

tgodfrey0 commented 11 months ago

Hi, sorry I've been super busy. I haven't had a chance to try running only the VIO for that robot yet. But I was running it on Ubuntu 20.04 with ROS Noetic. I have an Intel i7-1165G7 with 16GB RAM and 32GB swap.

I think alongside running just the VIO I'll also try to completely rebuild the library and its dependencies.