gdlg / panoramic-depth-estimation

Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery
https://gdlg.github.io/panoramic
Other
61 stars 9 forks source link

Rules for Definition of Depth #4

Open lasxz opened 5 years ago

lasxz commented 5 years ago

I got the depth data from the. npy file, but I found that the nearer the point in the panoramic image, the greater the depth value. I want to know how you define the depth. thank you.

gdlg commented 5 years ago

Yes, sorry for the confusion. You are right.

This is the disparity value however this might not have be the best definition of disparity. I will try to update the code with a better definition.

In the meantime, you can get it in meters using:

disp = np.load(file)
depth = (disp.shape[1]/math.pi)/disp
lasxz commented 5 years ago

I'm a little confused. You define d_equi=r/α in your paper(Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360◦ Panoramic Imagery). As I understand it, d_equi is the disparity,and r is depth.This is not consistent with your reply.

And I have another question about your paper. In the definition of T_equi(Eq.9),c_λ and c_φ is not defined.What exactly do they represent?

gdlg commented 5 years ago

The paper is correct however, when I released this code, I tried to clean up the code but the definition in the code doesn’t match the paper anymore. The above piece of code should work with the current version of the code. I will rectify it to be more consistent however I need to change both the code and the evaluation in the Carla dataset for that.

(c_λ, c_φ) is the centre point of the camera in pixels which is likely to be (w/2,h/2). In our paper, the panorama was cropped vertically to ignore the sky and the roof of the sky however the crop wasn’t centered on the horizon. We took into account the crop in the parameter c_φ. This is not particularly important for depth estimation however it is required to transform the bounding boxes to the correct frame of reference for object detection.

lasxz commented 5 years ago

I wrote the following matlab code according to Eq.14,but i cant recover [x,y,z]. Both panoramic images and predicted data come from your data set. I want to know what's wrong with me. thank you.

` disp=load( 'test_007998.out'); a=2*pi/2048; T_equi=[a,0,1024;0,a,150;0,0,1]; fp=fopen('test.xyz','w+'); for i=1:300 for j=1:2048

%         r(i,j)=disp(i,j)*a;
        r(i,j)=(2048/pi)/disp(i,j);

        b=inv(T_equi)*[i,j,1]'; 
       % Inverse function of function Γ
        t(1)=b(3)*tan(b(1));
        k=sqrt(b(1)^2+b(2)^2+b(3)^2);
        t(2)=k*sin(b(2));
        t(3)=1;
       % unit vector of t
        u=t/sqrt(t(1)^2+t(2)^2+t(3)^2);
        c=r(i,j)*u;

        fprintf(fp,'%f %f %f\n',c(1),c(2),c(3)); 
    end
end
fclose(fp);

`

gdlg commented 5 years ago

I think that the problem is that the tan(b(1)) is the ratio x/z, not x itself. If you can assume z=1 which is the case when reprojecting rectilinear images it works but not when z can be negative.

Here is the code that we used to reproject the centre of a bounding box:

lon = ctr_xy[:,0]
lat = ctr_xy[:,1]
targets_y = targets_r * np.sin(lat)
targets_r1 = np.sqrt(targets_r ** 2 - targets_y ** 2)
targets_x = targets_r1 * np.sin(lon)
targets_z = targets_r1 * np.cos(lon)

where (lon, lat) would be your b after the multiplication by the matrix inverse. (defined at https://github.com/gdlg/panoramic-object-detection/blob/bbc402f0fbf469452872253a994e9404177d8387/detect.py#L248)

lasxz commented 5 years ago

I still can't recover xyz according to your code, In my program,

disp = np.load(file); depth = (disp.shape[1]/math.pi)/disp; targets_r=depth

but in your link:

targets_z = focal_length / (distance_pred_a[:] * proposal_height + distance_pred_b[:]); line:226 targets_r = targets_z line:248

So I wonder if it's only possible to use targets_z you gave.If so, how do i get the values of distance_pred_a and distance_pred_b? thank you

gdlg commented 5 years ago

The code that you are refering to is related to the object detection network, not the depth estimation so it is not relevant. targets_r=depth should be right for your use case.

lasxz commented 5 years ago

I can't recover xyz in many ways according to your paper. Can you give me a reference code for recovering xyz code in panoramic-depth-estimation please? Thank you

targets_r=depth The depth I use is derived from the object panoramic-depth-estimation.

gdlg commented 5 years ago

What problem do you get?

I don’t have any reference code for what you are trying to do. We evaluated the depth map in image space and didn’t try to project it to a point cloud.