ROBOTIS-GIT / turtlebot3_home_service_challenge

Turtlebot3 Home Service Challenge(with OpenManipulator X)
Apache License 2.0
8 stars 9 forks source link

[WARN] Timed out waiting for transform to become available before running costmap #9

Closed ghost closed 4 years ago

ghost commented 4 years ago

This is another error that's constantly displayed to the terminal after launching the demo launch file. As a result (I guess), my RVIZ is also filled with errors. Screenshot from 2020-05-19 15-34-32 Screenshot from 2020-05-19 15-34-22

The TurtleBot3 model is white and the OpenMANIPULATOR-X is in a weird position where most of the joints are not connected and the links are in strange positions. Similar to the other two issues, this is on a clean install with all the other installation and setup steps giving me no errors.

How can this be fixed?

ROBOTIS-Will commented 4 years ago

Hi,

If you haven't sync the time between your PC and TurtleBot3, please use below command to sync the time of two machines.

$ sudo ntpdate ntp.ubuntu.com

Thank you.

ghost commented 4 years ago

Hi @ROBOTIS-Will,

I ran the sudu ntpdate ntp.ubuntu.com on the remote PC, after which I ran it on the TurtleBot3 (using SSH). However, the error still occurs.

What I did notice, is that changing the fixed frame topic from "map" to "tb3_hsc/base_footprint" cleared most of the errors and it gave me a correct 3D model of the TurtleBot3, a camera view and LIDAR pointcloud/measurement data visualization. However, as the screenshot below also shows, there are other errors that occur with this change. Such as the map that has shifted. Screenshot from 2020-05-20 08-52-31

ROBOTIS-Will commented 4 years ago

Since you don't have the actual arena, the amcl plugin for localization is not working properly and positioning the TurtleBot3 in the map also fails. You'll need to build the arena to try the example with real robot successfully.

ghost commented 4 years ago

I just replaced the included arena map with my own map that I created using the SLAM step of the tutorial. This still gave me the same issues. Does this mean that somewhere in the code, there is something hardcoded that prevents this from working with my own space?

ROBOTIS-Will commented 4 years ago

After the RViz is up and running, try 2D Pose Estimate from the Rviz menu. http://emanual.robotis.com/docs/en/platform/turtlebot3/navigation/#estimate-initial-pose This will help to set the initial pose of the robot in the map.

ghost commented 4 years ago

Thanks for the suggestion! I'll give it a try on Saturday when I'm back at the testing facility.

ghost commented 4 years ago

@ROBOTIS-Will Alright, I've tried a bunch of things yesterday and today, and I've finally got it to work (didn't even have to use the 2D Pose Estimate). I think it was a combination of synchronizing the time between the TB3 and RemotePC, updating the RPi 3, performing a catkin_make on the RemotePC and perhaps some luck :)

Anyway, this issue can be closed now. Thanks for the help, as always :)

ROBOTIS-Will commented 4 years ago

@The3Designer That's great! Hope you are safe from the Covid-19 :)

ghost commented 4 years ago

Sorry for reopening this issue, but it stopped working again and I honestly don't know why it is so inconsistent.

I've started with a clean install of Ubuntu 16.04 on Oracle VirtualBox. I began with the general ROS 1 Kinetic setup. Then I continued with the OpenMANIPULATOR-X packages, after which I installed the packages from the TurtleBot3 Manipulation chapter. Then I installed the home service challenge pakcages and performed all the calibration steps and created my own map. I replaced the stock tb3_hsc.pgm file with the one I created with SLAM, so that Rviz would use the correct map.

So everything is stock and as it should be, except for the map. Still I get the same error shown in the first post. I've tried doing the 2D estimate position, performing many clean installs, re-running calibration, installing OpenGL graphic drivers, performing catkin_make command, synchronizing the time...

I'm lost on what I should do...

cnboonhan commented 4 years ago

Hi there, just dropping by and chiming in with my thoughts. I do not have this setup, but I did once encounter an error rather similar to the error you have, on a different setup. Perhaps it could at least give some sort of insight.

The crux was that the transform from base_footprint to map is being published, just that it is done juuust slightly too late. ( Therefore violating the timeout constraint of 0.1 ). In my case of a mobile robot, a quick solution was to increase the publish frequency of my robots LiDAR by increasing its spinning speed. I did this by manually increasing a PWM parameter in the source code of it's driver. This seemed to relieve some sort on bottleneck which was "throttling" the tf. In the same vein, I wonder if some sort of similar speedup could be applied / the timeout constraint could be relaxed.

In any case, running roswtf might give some additional insights as well.

ghost commented 4 years ago

@cnboonhan94 That's an interesting theory, I'll see if the LIDAR can speed up, or if I can lower the timeout constraint a bit.

Although I'm not sure if that's actually the issue, because changing the Fixed Frame to /tb3_hsc/base_footprint does give me proper LIDAR data point cloud visualization... But I'm going to try it nevertheless 👍

Here's what I get when running the roswtf command while Rviz and the TB3 manager are launched: roswtf

ghost commented 4 years ago

For some additional information, here is what "rosrun tf view_frames" gives me: frames

And here is what "rosrun tf tf_echo /map /base_footprint" gives me: Screenshot from 2020-05-27 14-11-12

ghost commented 4 years ago

Update: I've gotten it to work consistently, by changing the X and Y coordinate values in the room.yaml file to values that actually fit into the room/map file (the standard arena is bigger than my room haha).

However, I still get some warnings, but I'm not sure if this is a real issue: Screenshot from 2020-05-27 14-58-51 Do these "Map update loop missed ..." errors matter?

cnboonhan commented 4 years ago

Oh that's so interesting, I can't imagine why such a side effect would occur from this room dimension configuration. Perhaps this difference was just enough to slow down amcl to "throttle" the tfs, but not enough to blow it up? :man_shrugging: awesome work though.

I have been living with this warning ( regarding the map update rate ) without much issue in my use case. Perhaps you could observe how the update loop frequency varies with load on your system? If it's too drastic, it might cause some bad navigation behaviours. I have not personally noticed anything significant, using a raspberry pi 4.

ghost commented 4 years ago

@cnboonhan94 Well, my enthousiasm didn't last long, as the "fix" with the room positions, didn't work consistently in the end. After about 6-7 succesfull launches, the error got back again and I wasn't able to get it back up and running again. The inconsistency is really frustrating.

I have also tried running the entire HSC from an actual Ubuntu installation (dual boot with W10) instead of running it on a VM, since I read some things about Rviz not functioning properly inside some VMs. However, this didn't fix the problem and simply resulted in the same error/situation.

So, back to square one I guess... :(

I'm trying to find the code that determines the timeout value, but I haven't been able to find it yet. Do you know where this value should be located?

ghost commented 4 years ago

I don't know if this has anything to do with the main problem, but I'm also getting this error message when switching the Fixed Frame under Global Settings to "tb3_hsc/base_footprint" (remote.demo.launch terminal on the left, manager.launch terminal on the right): Screenshot from 2020-05-28 10-57-30

When I leave the Fixed Frame on "map", this issue doesn't occur because then there's no camera feed.

Edit: never mind, this error also occurs when leaving the Fixed Frame setting on "map". It just takes a bit longer for the error message to show...

cnboonhan commented 4 years ago

i totally understand, non-deterministic behavior is a pain. :white_flag:

I don't mean to bring you on a wild goose chase, but i would be super interested to see what would happen if you did the following:

# Build AMCL with some modifications
# in your workspace src: 
git clone git@github.com:ros-planning/navigation.git
# Edit navigation/costmap_2d/src/costmap_2d_ros.cpp, line 98
# You should see     if (last_error + ros::Duration(5.0) < ros::Time::now())
# Change to     if (false)

# Rebuild
catkin build
source devel/setup.bash

Essentially force-removing the error. :crossed_fingers: Note that we will have to make sure this local version of amcl is used. To be absolutely sure, we could remove the existing binaries. I think its sudo apt remove ros-[ROS_VERSION]-amcl. But this should not be necessary. I might be typing some commands wrong, so do double check it makes sense before typing them.

EDIT: for context, we are changing the if statement:

    if (last_error + ros::Duration(5.0) < ros::Time::now())
    {
      ROS_WARN("Timed out waiting for transform from %s to %s to become available before running costmap, tf error: %s",
               robot_base_frame_.c_str(), global_frame_.c_str(), tf_error.c_str());
      last_error = ros::Time::now();
    }

to

    if (false)
    {
      ROS_WARN("Timed out waiting for transform from %s to %s to become available before running costmap, tf error: %s",
               robot_base_frame_.c_str(), global_frame_.c_str(), tf_error.c_str());
      last_error = ros::Time::now();
    }
ghost commented 4 years ago

I'll give your commands a try. I also found something about chrony local network time synchronization (https://github.com/ROBOTIS-GIT/turtlebot3/issues/78#issuecomment-391832742 & https://chrony.tuxfamily.org/doc/3.3/chrony.conf.html#_isolated_networks), so I'll be giving that a go too.

ghost commented 4 years ago

@cnboonhan94 Sadly enough your code did not work. It did suppress the error message from the terminal, but Rviz still showed the errors in the panels. The chrony isolated network synchronization also didn't work...

I'm starting to get a bit desperate... This is for my thesis project, so I really want it to work :(

cnboonhan commented 4 years ago

I absolutely was not paying attention to the code, that was a hilarious oversight, but do have a look at this line instead, just slightly above: https://github.com/ros-planning/navigation/blob/melodic-devel/costmap_2d/src/costmap_2d_ros.cpp#L95

 while (ros::ok()
L95>>      && !tf_.canTransform(global_frame_, robot_base_frame_, ros::Time(), ros::Duration(0.1), &tf_error))
  {
    ...
  }

I should have pointed to this line instead. Specifically, i refer to the ros::Duration(0.1), which seems to correspond to the timeout of 0.1s in your warnings. Could you perhaps try to change 0.1 to 0.2, which seems appropriate for the values you have?

ghost commented 4 years ago

I'd like to try that one, because it sounds like a possible solution. However, this timeout value must also be present in the current packages, right? I should not have to download the ROS planning/navigation packages for that, since the HSC can perform without those.

It's not that I don't want to install those packages, but if it's not necessary :)

And it might take a bit longer before I can change that value, because I'm doing a fresh install on both the TB3 and the RemotePC... Just for good measure...

cnboonhan commented 4 years ago

Yea, that's true, you shouldn't need to download the navigation source code. I'm afraid I'm not sure exactly why this error is appearing, but it does seem to be hardware-related ( the timeout of 0.1 is so slightly shy of what the current hardware is able to produce ). I could definitely be wrong, though. All the best bug hunting :bug:

ghost commented 4 years ago

Yeah, the inconsistency also makes me believe it's not software related. Anyway, thanks a lot for the help so far :)

Hopefully someone else can help me...

ghost commented 4 years ago

@ROBOTIS-Will or @ROBOTIS-Kayman I have a question for you:

Just to be sure that the error is not because of my modifications, can this HSC tutorial be broken and result in the error message from the first post, if I do one or more of the following steps:

I'm pretty sure these modifications shouldn't result in the error from my first post, since I've also tried the stock tutorial and this results in the same error. But I just want to here it from the professionals ;)

These are all the modifications that I've done to the stock HSC tutorial to make it work for my assignment. Do any of these steps break the assignment, or is the cause for the error somewhere else?

I didn't notice any difference between running ROS in a VM or in a full Ubuntu install (dual boot), so I don't think that can be the issue.

For the VM, I've tried running the HSC with and without the VirtualBox guest additions CD installed, but this made no difference either.

By process of elimination, what can the cause be...?

ghost commented 4 years ago

I absolutely was not paying attention to the code, that was a hilarious oversight, but do have a look at this line instead, just slightly above: https://github.com/ros-planning/navigation/blob/melodic-devel/costmap_2d/src/costmap_2d_ros.cpp#L95

 while (ros::ok()
L95>>      && !tf_.canTransform(global_frame_, robot_base_frame_, ros::Time(), ros::Duration(0.1), &tf_error))
  {
    ...
  }

I should have pointed to this line instead. Specifically, i refer to the ros::Duration(0.1), which seems to correspond to the timeout of 0.1s in your warnings. Could you perhaps try to change 0.1 to 0.2, which seems appropriate for the values you have?

I've tried this suggestion, but it didn't work. Instead of the error being "... returned after 0.100xxx timeout was 0.1" the message was "... returned after 0.200xxx timeout was 0.2". So sadly enough, the problem isn't fixed yet.

ghost commented 4 years ago

Another thing I discovered is that both the robot_description and robot_description_semantic are not correct when starting the launch file: Screenshot from 2020-05-29 16-50-08

Both descriptions begin as a regular xml file. I don't know if this has anything to do with the original issue, but how can I solve this and make the descriptions correct?

ghost commented 4 years ago

Another update.

I just had the whole thing working. Then I rebooted the TB3 and my RemotePC. I didn't change a thing, but when I started the HSC, it didn't work...

Another interesting thing is that the regular manipulation navigation (http://emanual.robotis.com/docs/en/platform/turtlebot3/manipulation/#navigation) works flawlessly and consistently. Within 1-2 seconds I get a costmap and an estimate position of the TB3 within my room. This works every single time I boot it up.

So then I wonder, what's so different about the HSC setup (and with setup I mean the code, because obviously the hardware setup is the same for both the regular manipulation navigation and the HSC) that makes it not function properly and consistently?

As mentioned before, I need this to work for my thesis project, so I would really like it if someone could help me fix this issue :)

ghost commented 4 years ago

I have used the rqt_graph command to see the differences between the regular manipulation navigation (from here: http://emanual.robotis.com/docs/en/platform/turtlebot3/manipulation/#navigation) and the HSC navigation (from here: http://emanual.robotis.com/docs/en/platform/turtlebot3/home_service_challenge/#run-a-demo-and-manager-package).

Note that the regular manipulation navigation works every time. I pretty much instantly get a costmap, pose estimate and I can give navigation goals which the TB3 then perfectly executes. On the other side, the HSC still is not working, with the error from the first post.

The image below shows the difference between the two rqt_graph commands. I think the interesting and relevant difference is the fact that the /tb3_hsc/map topic is not used in the HSC graph. Where should this be connected to to make this HSC work, and how should I do that? rqt_graph_differences

ghost commented 4 years ago

Is there nobody that can help me? @ROBOTIS-Will, @ROBOTIS-Kayman?

I've just installed a Raspberry Pi image from Ubuiqity Robotics and installed all the TB3 dependencies and packages, but with no luck. I'm still getting stuck at the same point.

So the regular navigation function works great, no problems there. The costmap is generated pretty much instantly and when I set a navigation goal, the TB3 goes to that point without any problems. From the HSC, the OpenMANIPULATOR-X part also works. I can send messages to the manager, which will then put the robotic arm in the specified pose. Furthermore, I am able to perform all the calibration steps from the HSC. So, it's not the entire HSC that is broken.

From my reasoning then, there has to be something wrong with the navigation/localization part of the HSC, perhaps in combination with my setup (maybe my room is too small for the HSC for example, due to a variable somewhere, but this variable does not apply to the regular navigation package, which is why that one does function properly).

However, I don't know what's causing my problem, but I've seen three videos (including the one from you guys/ROBOTIS) where this HSC is working on the same setup (TB3 Waffle Pi with OpenMANIPULATOR-X). Therefore, I believe it has to be possible to get this to work on my setup too.

Since I need this HSC to work for my thesis project, I would really like some suggestions on what could be the cause for my problem. As this is the final part I'm stuck on, I really want to resolve it as quick as possible. Hopefully you guys understand my situation and can give me some guidance or tips on what the issue could be :)

ROBOTIS-Will commented 4 years ago

@The3Designer Since you are having trouble with the timing, could you try using the ethernet cable instead of wifi network? Then sync time for both machines with the sudo ntpdate ntp.ubuntu.com command might help to figure out what could cause the timing issue.

ghost commented 4 years ago

@ROBOTIS-Will I've tried your suggestion, but it didn't work. I connected the TB3 with an ethernet cable. Then I ran the sudo ntpdate ntp.ubuntu.com command om both the TB3 and RemotePC, but I still got the timeout error.

I also ran a speedtest on the TB3 and RemotePC (in the actual VM), these are the results I got:

TB3:

RemotePC:

So the TB3 has a slower connection, but I would guess that it's still fine.

Additional information I've also installed iperf on the TB3 and RemotePC to check the data transfer bandwith between the two machines over my network. With the TB3 as client and the RemotePC as a server, I was getting 280 Mbits/sec bandwith. With the TB3 as a server and the RemotePC as client, I was getting 320 Mbits/sec bandwith.

ghost commented 4 years ago

My problem is solved. It was an issue where the objects in the room where not exactly the same as when the SLAM map was generated. Therefore, the costmap had troubles being generated. Thanks for all the suggestions everyone!