carla-simulator / carla

Open-source simulator for autonomous driving research.
http://carla.org
MIT License
11.1k stars 3.57k forks source link

Camera Depth Projection Inconsistency/Distortion #1379

Closed hongyuli closed 5 years ago

hongyuli commented 5 years ago

Hi, I have been following and using Carla for a few months. I appreciate Carla team providing such a powerful simulator.

The camera depth projection inconsistency/distortion issue was raised in earlier versions, but it has not been fixed. I really want to bring this into the team's attention again, since this is a very important feature for us!

(1) Short summary of the issue If putting two vehicles in two different locations but facing similar views, the point cloud generated by two depth camera's is supposed to be matched. We will expect to have a perfect matching since there should be no distortion in the simulator camera. Even if it was not a perfect matching, there should not be such a large discrepancy. However, the reality is that the point cloud from different views cannot match. As shown here Screenshot from 2019-03-08 09-14-30_meitu_1 The area marked by red should be matched for the traffic light and bicyclist, but they are mismatched.

(2) Projection Method I have checked all the related issues, but couldn't find any solution. The basic worflow is (i) firstly project depth points to camera coordinates system, as shown in #553 #56 (ii) project them into world coordinates system. camera_to_car_transform = Transform(Rotation(roll=-90, yaw=90), Scale(x=-1)) world_transform = the tranform we got from the depth image car_to_world_transform = world_transform * camera_to_car_transform pointscloud_in_world_system = car_to_world_transform.transform_points(pointscloud_in_camera_coordinate_system) Please correct me if there is anyting wrong.

(3) Why this is important? Since Carla does not provide the object detection and instance segmentation groundtruth, we will have to generate the groundtruth all by ourselves to fine-tuned/re-train the models. The depth image based 3D projection is one of the steps to generate the groundtruth. (Projecting segmetation groundtruth image into 3D space to match with each vehicle's bounding box can obtain the instance segmentation groundtruth.) However, due to this camera depth projection inconsistency/distortion, the genetated groundtruth are not reliable and cannot be used to re-train/fine-tuned the models.

To participate the Carla challenge and move forward with my own research, I really need a correct depth image projection. The weird 2D bounding box issue in #738 may be also caused by this problem. Other Related issues.

I understand the team is quite busy, but if this remains an issue, it would become a barrier to use carla for further experiments. I have tested on version 0.9.2, 0.9.3 and 0.9.4. Since I need to use multiple agent API, I cannot reverse back to 0.8.X.

Thanks.

Luyang1125 commented 5 years ago

I got a similar problem when using the depth sensor in Carla. Please let us know if there are any future plans to fix it. Thanks.

analog-cbarber commented 5 years ago

I am not seeing problems like this with our code. I am not sure exactly how you are converting depth images to points, but it sounds like you may be using a projection matrix, which is not the correct technique due to the non-linear depth mapping. Instead you should be computing the cosines of the center of each pixel in the depth image relative to the camera orientation (you only need to do this once) and using basic trigonometry to get the camera relative points, which you can then convert to world coordinates using the appropriate affine transform.

hongyuli commented 5 years ago

Hi, @analog-cbarber ,

Thanks a lot for your response. I did use a projection matrix since @marcgpuig provide such a solution in #56 that

Given a 2D point p2d = [u,v,1], your world point position P = [X,Y,Z] will be: P = ( inv(K) p2d ) depth

I'm wondering whether you can provide sample code to perform the projection. Really appreciate!

analog-cbarber commented 5 years ago

I don't have time right now, but I will try to extract my geometry code and publish it on github at some point when I get a chance.

In the meantime, here is our function for converting range (not depth), to a camera-relative point, which you can then translate to World or some other reference point using the usual affine transform.

We store our data as range maps (radial distance to pixel, so the cosine conversion is needed to get depth; you won't need that.

    def _view_range_to_point(r:float, u:int, v:int,
                              cosine_map: np.ndarray, u2y: np.ndarray,
                              v2z: np.ndarray) -> np.ndarray:
        x = cosine_map[v,u] * r
        point = np.zeros(3)
        point[0] = x
        point[1] = u2y[u] * x
        point[2] = v2z[v] * x        return point

The u2y and v2z arrays are computed by:

def camera_tan_arrays(height: int, width: int, fov: float,
                      pixel_height:float = 1.0
                      ) -> (np.ndarray, np.ndarray):
    h2 = height // 2
    w2 = width // 2
    z = w2 / np.tan(np.pi * fov / 360)
    row = np.linspace(-w2 + .5, w2 - .5, width) / z
    column = np.linspace(h2 - .5, -h2 + .5, height) * pixel_height / z
    return row, column

I really do not think there is anything wrong with the CARLA depth sensor or bounding box information (other than some small discrepancies with how the box fits on different car models), and have a colleague who has built a giant pointcloud of town01 based on depth sensor data from multiple simulation episodes.

hongyuli commented 5 years ago

Thanks for sharing the code. I tried your code, but still can't get the correct result.

I'm wondering what is the 'x' here. Is it the depth (the reading we got from the depth sensor) or range (radial distance) or a value after conversions?

I guess the part I missed is just how to convert the depth value obtained from the depth sensor to the 'x'.

analog-cbarber commented 5 years ago

Yes x is the depth (I am using Unreal's relative coordinate convention) which can just be copied directly from the sensor without the cosine conversion from my function.

Once again, my code only gives you the coordinate of the points relative to the depth camera, not the world coordinates.

hongyuli commented 5 years ago

Got it. Actually, we are doing exactly the same process. Your code is equivalent to

Given a 2D point p2d = [u,v,1], your world point position P = [X,Y,Z] will be: P = ( inv(K) p2d ) depth

Are you using the same projection process from camera coordinates to world coordinates?

camera_to_car_transform = Transform(Rotation(roll=-90, yaw=90), Scale(x=-1)) world_transform = the tranform we got from the depth image car_to_world_transform = world_transform * camera_to_car_transform pointscloud_in_world_system = car_to_world_transform.transform_points(pointscloud_in_camera_coordinate_system)

analog-cbarber commented 5 years ago

Regarding the first part, I think you may be right as long as the last row of inv(K) == [0,0,1], because the depth cannot change, but that is not how we do it.

As to the second part. I don't really understand what you have written. The camera to car transform should transform the camera coordinate system to the car coordinate system but what you wrote looks like it is just permuting the dimensions. The transform you get from the depth image is relative to the camera, not to the world. Where do you account for the position of the car or the offset from the camera to the car?

analog-cbarber commented 5 years ago

In any case, I don't think there is a CARLA bug here.

hongyuli commented 5 years ago

Since the offsets from the cameras to the cars in my experiments are the same, I found neglecting these offsets will get the same results. I also tried to add them back, but still got the same result.

camera_to_car_transform = Transform_relative_offset_camera_vehicle Transform(Rotation(roll=-90, yaw=90), Scale(x=-1)) world_transform = the tranform we got from the depth image car_to_world_transform = world_transform camera_to_car_transform pointscloud_in_world_system = car_to_world_transform.transform_points(pointscloud_in_camera_coordinate_system)

analog-cbarber commented 5 years ago

You position your camera at the origin point of the car? How would that work? The sensors would just see the inside of the car mesh.

In any case, the bug has got to be in your code somewhere.

hongyuli commented 5 years ago

Not at the origin point of the car, I set all the camera transforms as Translation(x=1, y = 0, z = 1.5).

hongyuli commented 5 years ago

I finally found the problem is the rounding error, not the projection process. I stored the depth image after ColorConverter, which will decrease the precision from 32bits to 8bits. The correct way is just to store the raw depth image.

Sorry about the confusion. Thanks for the discussion. @analog-cbarber

marcgpuig commented 5 years ago

Hi @analog-cbarber First of all, thanks for your answer and you help in the community :)

it sounds like you may be using a projection matrix, which is not the correct technique due to the non-linear depth mapping

As far as we know our depth is linear and you can compute the 3d points using a projection matrix since these 2d images have been generated using the inverse projection matrix in Unreal. Please, correct me if I'm wrong. Cheers!

analog-cbarber commented 5 years ago

Yes, there is nothing wrong with what CARLA is reporting. I was just speculating that if one one were to use a homogenous projection transform matrix, the resulting depth from the transform would not be linear.