AutonomyLab / ardrone_autonomy

ROS driver for Parrot AR-Drone 1.0 and 2.0 quadrocopters
http://wiki.ros.org/ardrone_autonomy
BSD 3-Clause "New" or "Revised" License
356 stars 226 forks source link

Discussion on Driver's Threading Architecture #42

Open mani-monaj opened 11 years ago

mani-monaj commented 11 years ago

This issue is related to #39 & #41

First I restate one of my previous comments with modifications:

I did more tests on "realtime" and "legacy" publishing of "Navdata" and here are my thoughts:

"Full Speed Navdata" is not a good idea. The current implementation bypasses main ROS loop running in a different thread while instantly publishes the data which may generate undefined behaviour. I say this because I could not find any information about ROS that states it is safe to bypass the same ROS loop from another thread.

When navdata_demo is turned on and the Drone is publishing at 200Hz, the main ROS loop can not catch up even with 500Hz loop rate. I think this is because of expensive memory copies done in video and navdata callback functions, many locks/unlocks between two threads and other custom code/ROS overheads during the ROS loop.

In order to examine it more, I put a ROS_INFO inside this if block to print the difference between copy_current_frame_id and last_frame_id. If ROS loop is in sync with drone update loop, this number should be always 1, otherwise it shows the number of navdata misses between consequent ROS updates which consequently leads to frame rates less than 200Hz. The result was as I expected. There are lots of misses when the looprate is set to 200Hz. Increasing that number will lead to better navdata frame rate (and less misses). In my case setting that number to 1000Hz would lead to 200Hz navdata update rate.

In summary, I suggest to remove the "Full Speed Navdata". Anyone who is interested in getting 200Hz navdata update, can increase the looprate variable appropriately.

The current driver thread model is that ROS update loop (ARDroneDriver::run()) is being run in another thread alongside SDK2's video_thread and navdata_update using SDK2 threading macros. The problem of the ROS thread being unable to catch up with fast (200Hz) navdata_update is that the ROS loop has a wait inside it. While ROS loop is [busy] waiting by pre-defined frequency (loop_rate.sleep()), it will eventually miss a couple of data frames that is being received in the other thread in navdata_custom_process.

The reason why the fullspeed_navdata works is that it pushes the new data into ROS publish buffer for a topic which will be pushed to topic's queue on ros::spinOnce() looprate times in a second. I was wondering if it is a good idea that we re-structure the driver's code so that publish_navdata and publish_video is being called directly from SDK2 thread exactly the same way that rosDriver->PublishNavdataTypes(shared_raw_navdata) is being called now.

My concern is whether publishing to ROS from another thread is safe or not. As far as I could understand, the ROS multi-threaded spinning model is meant for subscribers/publishers from different thread, which is slightly different from our case.

I would really appreciate any comments @mikehamer, @JakobEngel.

JakobEngel commented 11 years ago

I'm only semi-experienced with ROS (never had to do with it before writing tum_ardrone), but as I understand it:

=> publishing directly in the SDK callbacks is the best and most efficient solution, while producing the smallest latency. Just make shure the topics are advertised before any other thread publishes on them ;)

The reason why even with looprate=200 some navdatas are skipped, is because thy might get stuck in some Network buffer before, that is the SDK-navdata callback is not called every 5ms, but maybe 3 times within 1ms, and then again only 15ms later. Have a look at the ros-timestamps of the navdata packages sent out, they are everything but regular, especially if other computationally-expensive stuff is running on ur pc. Its not an issue of anything taking too long (computationally), but simply of very irregular intervals, which should be solved by publishing directly in the sdk callbacks.

mikehamer commented 11 years ago

How well the loop is keeping up with the received data can be determined by comparing the frequencies of rostopic hz /ardrone/navdata vs rostopic hz /ardrone/navdata_demo.

I also inserted a printout of the time between loops in comparison to the time between navdata receptions. This confirms what @JakobEngel mentioned about network delays, which seem to be the major problem here.

In the following, we see that the loop keeps up with the navdata, when the navdata is received at a regular interval. However, there are cases where the network delay causes no packets to be received for a few loops, and cases where many navdata packets come at once.

[ INFO] [1354094907.319353778]: NAVDATA Period = 4045 usec.
[ INFO] [1354094907.321747973]: LOOP Period = 3672 usec.
[ INFO] [1354094907.323910639]: NAVDATA Period = 4551 usec.
[ INFO] [1354094907.327006364]: LOOP Period = 5192 usec.
[ INFO] [1354094907.327649755]: NAVDATA Period = 3769 usec.
[ INFO] [1354094907.332633835]: LOOP Period = 5616 usec.
[ INFO] [1354094907.334162937]: NAVDATA Period = 6462 usec.
[ INFO] [1354094907.336750306]: LOOP Period = 4156 usec.
[ INFO] [1354094907.337876744]: NAVDATA Period = 3752 usec.
[ INFO] [1354094907.341746467]: LOOP Period = 4995 usec.
[ INFO] [1354094907.342984021]: NAVDATA Period = 5085 usec.
[ INFO] [1354094907.348036933]: LOOP Period = 6279 usec.
[ INFO] [1354094907.352097578]: LOOP Period = 4012 usec.
[ INFO] [1354094907.357068480]: LOOP Period = 5033 usec.
[ INFO] [1354094907.361564322]: NAVDATA Period = 18525 usec.
[ INFO] [1354094907.362275359]: LOOP Period = 5230 usec.
[ INFO] [1354094907.366761187]: LOOP Period = 4457 usec.
[ INFO] [1354094907.372257735]: LOOP Period = 5507 usec.
[ INFO] [1354094907.372761666]: NAVDATA Period = 11139 usec.
[ INFO] [1354094907.376143762]: NAVDATA Period = 3476 usec.
[ INFO] [1354094907.376288293]: NAVDATA Period = 211 usec.
[ INFO] [1354094907.378004932]: LOOP Period = 5777 usec.
[ INFO] [1354094907.379070091]: NAVDATA Period = 2748 usec.
[ INFO] [1354094907.380202435]: NAVDATA Period = 1109 usec.
[ INFO] [1354094907.382106182]: LOOP Period = 4063 usec.
[ INFO] [1354094907.383237719]: NAVDATA Period = 3035 usec.
[ INFO] [1354094907.387215018]: LOOP Period = 5115 usec.
[ INFO] [1354094907.388007705]: NAVDATA Period = 4260 usec.
[ INFO] [1354094907.392086955]: LOOP Period = 4850 usec.
[ INFO] [1354094907.393764073]: NAVDATA Period = 6278 usec.
[ INFO] [1354094907.394070765]: NAVDATA Period = 344 usec.
[ INFO] [1354094907.398257646]: LOOP Period = 6197 usec.
[ INFO] [1354094907.400090020]: NAVDATA Period = 5962 usec.
[ INFO] [1354094907.401839753]: LOOP Period = 3592 usec.
[ INFO] [1354094907.406114524]: NAVDATA Period = 5999 usec.
[ INFO] [1354094907.409146595]: LOOP Period = 7064 usec.
[ INFO] [1354094907.411828679]: LOOP Period = 2892 usec.
[ INFO] [1354094907.412320936]: NAVDATA Period = 6071 usec.
[ INFO] [1354094907.416609960]: NAVDATA Period = 4444 usec.
[ INFO] [1354094907.418218664]: LOOP Period = 6430 usec.

One improvement I am implementing at the moment, is saving the ros time of packet reception, rather than the time of packet processing. This will ensure that the navdata output by the loop is stamped correctly.

Although all in all, I favor the approach that we output packets (and video) when received – we just need to make sure these functions are fast enough to keep up with the send rate.

What is also interesting, is that the loop period varies so much. The loop_rate.sleep(...) command should sleep for enough time such that a constant loop rate is achieved...

mikehamer commented 11 years ago

Continuing the discussion with respect to #44.

Implementation of realtime video publishing (and to some degree, navdata as well), indicates that the current driver architecture is not so well suited to this task, requiring a few hacks and lots of shared data to enable these functions.

The biggest problem we have, are the calls from ardrone_sdk.cpp and video.cpp, to what I have now made public functions in the ARDroneDriver class. These calls require a shared class, whose class definition is in ardrone_driver, and whose global variable declaration in ardrone_sdk. This is a clear dependency from sdk to driver, but the reverse dependency also exists, with driver dependent on shared data and parameters loaded in sdk.

I think we could solve a lot of these problems by externalizing the video and navdata publishing methods, which wouldn't be such a drastic change to the architecture.

Optimally though, perhaps thinking about how we can convert the currently external methods into internal driver methods may be a better option, leaving the sdk.cpp file to simply setup the ARDroneSDK environment and callbacks to driver functions. This way all execution outside of the initialization would be done internally to the driver.

What are your thoughts?

mani-monaj commented 11 years ago

Thanks for all of your feedbacks and tests @mikehamer & @JakobEngel .

publishing from any thread is fine, at least as long as no two threads publish on the same topic at the same time.

Fair enough.

spin() has nothing to do with publishing

Thanks. That is completely right. I misunderstood the underlying ROS architecture.

I am currently examining the changes in #44. I think what you concluded about network delay is correct

mani-monaj commented 11 years ago

44 merged into dev-unstable, I tested it a little bit. This needs to be more tested.

mani-monaj commented 11 years ago

Thanks @mikehamer for the changes. I did not have enough time today for the in-depth testing. But it looks like it works as expected.

mani-monaj commented 11 years ago

Optimally though, perhaps thinking about how we can convert the currently external methods into internal driver methods may be a better option, leaving the sdk.cpp file to simply setup the ARDroneSDK environment and callbacks to driver functions. This way all execution outside of the initialization would be done internally to the driver.

Can you please explain it a little further?

maxpfingsthorn commented 11 years ago

I was wondering if reimplementing the relevant parts of the ARDroneSDK would have any advantage. Looking at the manual, it shouldn't be too hard to do, but I'm not sure about the final benefit. Someone did it for Java (see javadrone project on Google Code). What is your opinion about this? Is the SDK fast and clever about things? Looking at the insides of the SDK is hard, because it uses a very exotic threading model and video decoding library, probably because for portability. I have the feeling it's optimized for iOS, not really for PC. Using the FreeFlight App on my iDevice, I get much better results (less video hiccups, less control latency) than on my PC, which is orders of magnitudes faster. Have you looked into GPU-accelerated video decoding? Any decent PC has one capable of it these days.

An important optimization point, especially for image transports in ROS, would be to implement the driver as a nodelet. Other ROS nodelets running in the same container will be passed the pointer to the original data (a boost::shared_ptr) and no serialization of the ROS message is needed. Other ROS node(let)s can still receive the messages over TCP/IP as before. In high-volume data applications, like stereo vision, this often means a performance difference of an order of magnitude or so. Maybe here as well?