autorope / donkeycar

Open source hardware and software platform to build a small scale self driving car.
http://www.donkeycar.com
MIT License
3.13k stars 1.29k forks source link

Integrate 2D Lidar data into Donkeycar #910

Open Ezward opened 3 years ago

Ezward commented 3 years ago

This is a discussion issue. The topic is how to integrate 2D lidar data, as from an RPLidar or YDLidar, into the DonkeyCar framework. It is likely that this will result in a number of following-on issues to actually implement the design that we decide upon. I'll start the discussion by adding a strawman as the first comment in this ticket.

So I realize there is a lot of work already done here. donkeycar/parts/lidar.py has code to read RPLidar, YDLidar and has code for plotting the data. It even has BreezySLAM. So when reading the strawman below, know that a bunch of this work is already done. I think the big piece we need to do is to integrate the camera and 2D lidar images so they can be fed to the CNN. Then we need to make sure our support for various lidar models and brands is working in that regard. Finally we may want to handle the issue of data skew while moving so our scans are more accurate. Also I think we can optimize the RPLidar driver a little more; it can filter the data as it is collected and it can insert the data sorted so we can avoid doing that each time we read the data.

Ezward commented 3 years ago

Integrating 2D Lidar data into Donkeycar

The 2D Lidar data

A spinning lidar captures data in 360 degree arc around it. It provides each data point as a distance from the lidar and the angle at which that reading was taken. It has some maximum working range. That range can be thought of as the radius of a circle where the lidar is at the center of that circle. Beyond that range we may actually get some readings, but we treat those a noise.

The lidar's 'zero' angle is relative to itself and so is all the data that it provides to us (it's super narcissistic). But we care about where the points are relative to the vehicle that the lidar is mounted on. If we treat directly forward as 'zero' angle for the vehicle, then we need to know the offset of the zero angle on the lidar relative to the forward direction of the vehicle. We can then adjust the angle component of each data point provided by the lidar to turn it into an angle relative to the vehicle. It may well be that the way the lidar is mounted that the zero angle of the lidar exactly coincides with forward on the vehicle, but we should include this offset (which in that case would be zero) anyway so we have a more general model that can be applied to other lidars that may not have their zero in the same place.

Plotting the 2D Lidar data

So once we turn the lidar data into a distance and angle relative to the vehicle, then we can use trigonometry to calculate (x,y) coordinates relative to the vehicle; the vehicle is defined as being at (0,0). We would likely want to set a bounding box in the 2D data space, so we could clip data we don't want. We would also want the dimensions of the bitmap we are to render onto.

A complication

Another thing to think about is the skew inherent in the data taken from a moving vehicle. Remember that the each data point provided by the lidar is given as a distance from the lidar and the angle at which the lidar was pointing when it took the reading. But if the lidar is on a moving vehicle, then the position from which each point is taken is changing. The lidar has some rate at which it can make a full 360 degree sweep. This is typically 5 to 10 hz. If the vehicle is stopped, then all data points are from a circle with the same center. However, as the vehicle moves, each point is taken from a different location. If the vehicle is moving 1 meter per second, which is pretty slow for a race car, and the lidar is spinning at 5hz, then the lidar has moved 1/5 of a meter between taking the first point in a sweep and the last point. There is additional skew that happens if the vehicle is turning.

A solution

If we know the vehicle's position and orientation as it moves, then we can simply adjust each point provide by the lidar using the vehicle's position and orientation at the time that point was taken.

If we have good odometry, then we can apply a kinematic model that estimates the vehicles position and orientation as it moves. Such estimates are good over short distances and we only need them to be good over a meter or so. If we have wheel encoders or an encoder on the drive motor, then we can use those. If we have an IMU, then we can also use that for the same purpose. Having both would be ideal.

But what if we don't have good odometry? That is typical of a Donkeycar. We know the throttle value and steering value, but we don't actually know the speed of the car. However, with a little calibration we can roughly estimate the speed of the vehicle from the throttle value. The calibration would involve taking some measurements over a known distance at different constant throttle values and measuring the time to traverse the distance. Then we can do interpolation to estimate the speed. That would provide a very rough estimate. It is affected by the particular surface the vehicle is on and by the battery charge level. If the vehicle is changing speeds, then this model also does not take into account the lag in acceleration and deceleration. However, it is better than not doing any adjustment of the data.

We also need to calibrate the steering angle as well; measure the radius of a full right and full left turn; then we can interpolate to estimate turn angle given a steering value.

With those two things in place, we can then use the speed and steering angle as input to a kinematic model to estimate the vehicle's relative position and orientation over time.

A Donkeycar 2D Lidar pipeline

Acquiring the data from the 2D lidar and transforming it into data useful to the CNN involves integrating a number of new parts into the vehicle pipeline. 2D Lidar pipeline involves 4 parts that work to gather data from the 2D Lidar, adjust for the changing position of the vehicle and render it to a bitmap so it can be saved to the tub.

  1. Encoder Part: If we have a motor encoder, then this part reads the motor encoder and writes the value and the time in nanoseconds at which the data was read. Donkeycar does have an encoder.py part that assumes that we have an arduino or similar microcontroller to read the encoder. That part defines how encoder data is written to the pipeline. We should try to use that if we can.

  2. Kinematics Part: This part takes as input the encoder reading (if we have an encoder), the throttle value and the steering value. It uses the encoder reading if available or throttle otherwise, and the steering value along with other relevant configuration and calibration values and maintains an estimate of the vehicle's position and orientation. It outputs the position and orientation to the data pipeline along with a high resolution timestamp (taken from the encoder value if available). Donkeycar does not currently have this part, but it would be a great part to add; we could use it beyond the 2D Lidar pipeline for things like navigating around an obstacle.

  3. 2D Lidar Part: This part takes as input the kinematic data (position and orientation of the vehicle). This part also reads data from the 2D lidar. It maintains a queue of all the points in the last 360 degrees of data; both the raw data received by the lidar and an adjusted value for the distance and angle from the vehicle that takes into account the kinematic data. Each point should also be marked with a time in nanoseconds at which it was read. The size of the data queue depends upon the resolution of the lidar (which should be configurable). The part outputs the queue into the vehicle's data pipeline so it can be processed by other parts and possibly saved to the tub.

  4. Lidar Imaging Part: This part uses the 2D lidar queue, a lidar data bounding box, an input image and an image bounding box and renders the 2D lidar data as pixels on the image. This image is output to the pipeline so later it can be saved to the tub. I'm suggesting keeping the part that reads the lidar data separate from this image generation part so that a) we can implement different 2D lidar parts for different brands of Lidar without having to include the image code with them b) we can use the 2D lidar data in other ways (like obstacle detection) without outputing an image.

    • The bounding box of lidar readings to be included in the image are configured in myconfig.py. The bounding box units would be in meters. So if our 2d lidar had a maximum range of 8 meters and we wanted all the data around the vehicle to be plotted, we would configure a bounding box (left, top, right, bottom) of (-8, 8, 8, -8) using cartesian coordinates and assuming the lidar as at (0,0). By specifying a bounding box of (-8, 8, 8, 0) we could ignore all the points behind the vehicle. Further, we could constrain the distance even more to eliminate data that is not important (that is too far away from the vehicle); for instance if we only cared about things up to 4 meters in front of the vehicle and 2 meters on each side we would specify this bounding box; (-2, 4, 2, 0)
    • The image bouds are configured in myconfig. This represents that area on the input image where the lidar data would be drawn.

We also want a part that will convert the normal RGB camera images into a bird's eye view image using a camera calibration matrix. This bird's eye view would then be the input image for the Lidar Imaging Part and the resulting combined image would be input to the CNN. See a more detailed discussion below.

A Neural Network using 2D Lidar data

It may be that there are other neural network archictures that would use the 2D lidar data directly, but Donkeycar is built around a CNN that takes images in and outputs steering and throttle. So if we stick with that, then we want to use the 2D lidar as an image and use this image as an input into the CNN. I can think of two ways to do that. One is pretty simple and one is more complex, but likely better.

One way to use the Lidar data is to concatenate the 2D lidar image with the camera image we already have. So if we are capturing a 160x120 camera image, we would want our 2DLidar image to be the same width (120 pixels), then we would use OpenCV to 'stack' the two images and make them into a single image. That single image would then be the input to our CNN. Note that we would need to change the code that calculates the CNN to use the dimensions of this new image, not the dimensions of the camera image. Further, when we drive on autopilot, we need to do this concatenation step and use the concatenated image to infer the steering and throttle.

I think a model build on that might be ok, but it's hard to know without testing it. It definitely would slow things down because the image would be larger. Also, if you think about this, it's not super intuitive that this would work well. The two images are in two different 'spaces'; it would not be obvious to a person how these correlate and it might not be obvious to a CNN either.

Another way, which I think would be better, would be to overlay the 2D lidar image onto the camera image in a way where the lidar pixels are correctly placed in the image. Each lidar data point represents a point in space around the vehicle where something exists, because it reflected that light. We should be able to place the 2D lidar pixel onto the pixel in the camera image that represents the spot in 3D space that reflected that light.

I think there are two possible ways to do this; the first approach attempts to take the Lidar data and project it into the 3D space that the camera image represents. The second method instead projects the camera image into a bird's eye view and then overlays the 2D lidar data on it. These two alternatives are discussed in more detail.

Option A: we could project the Lidar data points onto the camera image, applying a 3D transform that positions the pixels correctly in the image. To do this we must calibrate the camera and use that calibration to understand how to project the points in the 2D lidar data into the camera's world view. Remember that each Lidar point represents a point in the 3D world space around the vehicle. In this option, we take that 3D point and calculate the corresponding pixel in the camera image and draw it there. This is kinda cool, but I don't think it is the best way to do this because of perspective; as points get farther away, they are likely to occupy the same pixel as a nearby point because of perspective. That means we start to lose data.

Option B: To avoid the perspective data loss issues, we could use the camera calibration values to reformat the camera image so it appears as a bird's eye view; like we are looking at it from above. We essentially reformat the image so it looks like it was taken from directly above the Donkeycar. Then it is basically trivial to overlay the 2D lidar data, because it already is like a bird's eye view. We merge the pixels from the 2D Lidar image into the bird's eye camera view. Even better, we can generate this bird's eye camera view before we render the 2D lidar image and rather then render a 2D lidar image to it's own bitmap, we draw the pixels directly on the bird's eye camera image. That would be faster because we can avoid merging two bitmaps.

I think this bird's eye image would make the 2D lidar data much more powerful in predicting steering angle and throttle (assuming we slow down for obstacles). This approach would require another part; a part that takes the camera calibration data and the camera image and generates a bird's eye camera image. We would also make the Lidar Image Part accept an input image and rather than image dimensions, it would get a second bounding box that represents the area on the input image (the bird's eye camera image) where the 2D lidar data should be drawn.

I'm getting pretty excited about this second approach. It has other benefits;

This approach does require another calibration step; calibrating the camera. This is a well known process and there exists plenty of code samples in Python to generate the necessary calibration data. Further, there is plenty of Python code out there to apply this calibration data to turn a camera image into a bird's eye view; this is a common process in many robotics projects. The bird's eye view work is broken out into issue 1096.

Heavy02011 commented 3 years ago

Great preparation, Ed!

After digging a bit into lidars during the iros2020 race and currently trying to find the bug when trying to save sim lidar data into tubs (SIM_RECORD_LIDAR = True) I would like to add the following feature requests and offer to help:

  1. Make the use of the lidar identical for the physical & sim world by establishing an identical pipeline / data format for lidar (I have a RP A2M8 at hand to test real & sim training)
    • [x] 2. ability to set position of cameraA, cameraB and lidar independent from each other. I would like the camera position a bit more up like the camera position of the Old Guy Bots in order to avoid cropping images. Currently offset_x,y,z allows only one position, because inputs are collapsed into one position in TcpCarHandlers.cs in sdsim, as far as I experienced when trying to set them differently. SOLVED, soon to be merged into master.
  2. I hacked together a very crude part for detecting lanes and undistorting images to a top down view some time ago, much refactoring needed before using. https://gist.github.com/Heavy02011/07320c09c1870ef1e3a00878801fdfc9

Maybe we can have a look to the Autoware.auto pipeline, https://www.apex.ai/autoware-course, see Lecture 7 | Object Perception: LIDAR, as well.

TCIII commented 3 years ago

@Ezward, Thanks for initiating the discussion ticket, much appreciated. @Heavy02011, thanks for chiming in too.

I presently have three Slamtec RPLidars available to support this effort: A1M8-R5, A1M8-R6 (latest firmware) and a A2M8 (latest firmware). I have one running on a Rpi 4B, one running on a Nano 4GB, and the A2M8 running on a Xavier NX. The lidar.py part (DC dev branch) works well sending distance data to the TubWriter based on a lowest to highest angel value data stream and is limited to a 90 - 180 degree forward field of view. Please note that the A1 versions start the 0 degree point at the motor end of the Lidar chassis and scan clockwise towards the turret end. The A2 starts the scan at the front of the turret on a centerline drawn from the cable end though the turret center and out the front and scans clockwise. We might want to look at Dirk Prange's "Example" of sensor integration in the "Create your own Model" guide as a start?

TCIII commented 3 years ago

@Ezward,

I really like option B as it would allow me to operate my robot chassis without the need for lanes, as in a traditional DC environment, and allow for autonomous movement around a outside or inside perimeter. Though it is beginning to sound like we might need a SBC with more processing power than an Rpi 4B such as the Nano 4GB or a Xavier NX?

TCIII commented 3 years ago

@Ezward, Leave this open for useful links:

Paper on Lidar/camera fusion

Fast RPLidar Uses C/C++ and a python wrapper to improve speed

Adafruit CircuitPython RPLidar Library

Lidar based SLAM paper

A compilation of GitHub Lidar interface code

lidar_dewarping The purpose of the code here is to remove the distortions from the lidar scan.

SLAMTEC RPLiDAR A2 C++ Library

Clear Path: An interesting look at the A1M8

Heavy02011 commented 3 years ago

fyi: Maxime did an update on gym-donkeycar for solving this: “ ability to set position of cameraA, cameraB and lidar independent from each other.” I will test this today.

Ezward commented 3 years ago

@Heavy02011 thanks for those links. I see in the gist that you are using the trapezoidal method for warping the image perspective; I think we should allow for that method as it is pretty easy to configure. We can also allow the use of camera calibration matrix if the user has calibrated their camera as that also eliminates other distortions (like those from a wide angle lens).

TCIII commented 3 years ago

@Heavy02011 thanks for those links. I see in the gist that you are using the trapezoidal method for warping the image perspective; I think we should allow for that method as it is pretty easy to configure. We can also allow the use of camera calibration matrix if the user has calibrated their camera as that also eliminates other distortions (like those from a wide angle lens).

@Ezward,

How will this affect depth cameras like the Intel D435i depth camera as I would like to use the Slamtec A2M8 with that depth camera? Also, how is the data from your realsense435i.py part integrated into the DC training pipeline?

Ezward commented 3 years ago

In terms of the data format that is generated by the 2D Lidar part, I think it makes sense to save the all the data on each frame, even though we may only be getting some of it on each frame. What I mean here is that because the lidar will stream in data in segments as it collects it, we will constantly be reading some subset of the 360 degrees. However, we want to maintain a full array data with each frame we save to the tub.

So that means we want to stream data in and update any prior data as the new data comes in. The donkey pipeline is running 20hz or more and the lidar is only running somewhere between 4hz to 10hz. We don't want to wait for a whole 360 degrees of data before updating the data. I think most 2D lidars will stream data and tell you the angle range that it represents, which will be some subset of the 360 degrees arc.

The ROS message for 2D lidar is here: http://docs.ros.org/en/noetic/api/sensor_msgs/html/msg/LaserScan.html . The nice thing about this format is that it can hold some subset of the 360 degree arc or it can be used to hold all 360 degrees.

I can also see that we may want to tell the Donkeycar part that we only care about data from 0 to 180 degrees, for instance, so we are not writing more data that we will actually use. Again, we can still update this range as it streams in. We would just ignore data from 180 to 360 degrees in that case.

Ezward commented 3 years ago

How will this affect depth cameras like the Intel D435i depth camera as I would like to use the Slamtec A2M8 with that depth camera? Also, how is the data from your realsense435i.py part integrated into the DC training pipeline?

I would hope camera and lidar would be totally orthogonal; they should not interfere with each other. Of course if you have both of these, you may have a bunch of redundant data and so are writing more data then you need, but it should work. Short of some underlying driver incompatibilities anyway.

I'm going to try the 2d lidar without the D435; with just a normal RPi 160 degree camera. But I don't see any reason why the D435 and RPLidar would interfere; unless we define RPLidar part as a camera; since we only configure one camera at a time, so the 2D Lidar part should not be defined as a camera.

Rather I think we should have a new kind of part for 2d planar lidar, then we can implement config and code for the various vendors. So for cameras we have CAMERA_TYPE; maybe we can add a config for LIDAR_TYPE. Here I would create values for families of 2D lidars that share a basic driver api and underlying data format. For instance, if all of your RPLidar models speak to the same driver and produce the same data format but the data may differ by :

We would have additional configuration common to all 2d lidars. LIDAR_MIN_DISTANCE, LIDAR_MAX_DISTANCE in meters; which would be important for filtering out data. We probably also need configuration for angle range we want to capture; LIDAR_LOWER_LIMIT, LIDAR_UPPER_LIMIT in degrees; again go we can filter out data we don't care about.

I'm not sure how to handle configuring the number of samples we can expect in that desired range; that is determined by how fast the Lidar is spinning and the number of readings per second. I believe readings per second are generally fixed, but the motor speed is commonly adjustable via PWM. So perhaps we have config for LIDAR_PPS (pulses per second), which will depend upon your specific lidar. If we put that in configuration, the we can allow the driver to determine the best spinning rate and then calculate the number of readings it needs to save based on pulses per second and the spin rate it chooses. So the RPLidar model A1M8-R6 collects 8000 samples per second. If the DonkeyCar part for that lidar chooses a rotation rate of 10hz, then we can expect 800 samples per rotation. If the user configures, LIDAR_LOWER_LIMIT=0; LIDAR_MAX_DISTANCE=180, so only looking at 0 to 180 degrees, then we can expect 400 samples that we want to keep.

Ok, after writing all that I see we do have LIDAR_TYPE with types RP and YD and config for LIDAR_LOWER_LIMIT and LIDAR_UPPER_LIMIT already in a part called lidar.py. I'm catching up! We may want to add config for LIDAR_PPS to support the various RPLidar models. I will update the part to support the LDS lidar (I'll probably implement an Arduino for reading from the lidar, the use the serial port to send the data to the RPi, so it will be similar to the RPLidar).

TCIII commented 3 years ago

@Ezward,

Yes, I was going to recommend that you look at zlite's lidar.py part to see how the data capture range is limited and how many Lidars are presently supported. Unfortunately this data capture range limiting only works for the A1M8 that references the zero degree start point of the turret scan at the back of the chassis where the motor is located. The Slamtec A1M8 configuration sets the 0 degree point at the motor end of the Lidar chassis The A2M8 sets the 0/360 degree point at the front of the scanner on a centerline running from the cable end through the scanner axis and out the front of the scanner. This link may help visualize how the A1M8 scans. See "Useful Knowledge". This link may help visualize how the A2M8 scans. See page 11. rplidar_A1 rplidar_A2

Ezward commented 3 years ago

@Ezward,

Yes, I was going to recommend that you look at zlite's lidar.py part to see how the data capture range is limited and how many Lidars are presently supported. Unfortunately this data capture range limiting only works for the A1M8 that references the zero degree start point of the turret scan at the back of the chassis where the motor is located. The Slamtec A1M8 configuration sets the 0 degree point at the motor end of the Lidar chassis The A2M8 sets the 0 degree point at the cable end of the Lidar base and proceeds in a clockwise sweep around the base back to the cable end starting point.

I think we can detect the zero point on any 2D lidar; we simply look at that last reading we saved; if the current reading's angle is less than the previous reading's angle, then we have crossed the zero point. We can double-buffer scan; when we cross zero we start a new buffer and move the last buffer (which should be a full 360) to a 'last-buffer' status. Further, I think we can merge these two buffers in a way that gets us a single 360 degree scan, which may include points from the latest scan and points from the previous scan, but no redundant angles. So I don't think we actually need the zero point provided by the RPLidar, but we do need to know where the lidar considers zero degrees relative to 'forward' so we can adjust the output of all lidars so they have zero as forward.

TCIII commented 3 years ago

@Ezward,

Even though there is a YDLidar class in the lidar.py part, zlite told me that it was not presently functional. The Slametc A1 and YD X4 Lidars are probably the only two reasonably priced Lidars around with both the A2M8 and YD G4 touching over $300 plus tax on Amazon.

TCIII commented 3 years ago

@Ezward,

Lidar and Camera data fusion GitHub links:

Lidar and Camera Fusion for 3D Object Detection based on Deep Learning for Autonomous Driving

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

SFND 3D Object Tracking

Sensor Fusion NanoDegree- Camera Course Looks to be the best

lidar-camera-fusion Lidar/Camera calibration

Ezward commented 3 years ago

Here are a couple of libaries that can manipulate point clouds and do a few other things (like scene segmentation)

Ezward commented 1 year ago

Discussion of Lidar odometry and deskewing lidar data (skewing that happens when scan data is acquired on a moving platform). https://youtu.be/9FhKgAEQTOg Here is the github for the KISS ICP library https://github.com/PRBonn/kiss-icp This includes Python bindings. This is the associated paper https://arxiv.org/pdf/2209.15397.pdf

Here is an article on lidar odometry based on ICP using 2D lidar https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8587105/

Here is another article on 2D lidar odometry using ICP. Includes a code sample http://andrewjkramer.net/lidar-odometry-with-icp/