anuranbaka / OpenDTAM

An open source implementation of DTAM
Other
287 stars 151 forks source link

weird result on Trajectory for Variable Frame-Rate dataset #51

Open ShuangLiu1992 opened 7 years ago

ShuangLiu1992 commented 7 years ago

screenshot from 2016-10-02 15-58-26

the test data is downloaded from https://www.doc.ic.ac.uk/~ahanda/VaFRIC/test_datasets.html.

It seems the program is producing weird result on the computer monitor because it is all black and the cost/correspondence is ambiguous, Is there some quick fix to penalise large depth discrepancy when there isn't enough confidence to support it?

ShuangLiu1992 commented 7 years ago

also, could somebody please shed some light on how to perform the newton step with the data structure of openDTAM to get subpixel level result?

anuranbaka commented 7 years ago

Regarding your first question, we already penalize depth discrepancy when there is no other information: this is the AGd term in Eq.11 . The problem is that specular shine provides bad information, this is one of the fundamental difficulties of DTAM and similar pixel centric approaches. Even when Newcomb demonstrated it in person, computer monitors would blow out. It is hard to fix the problem, because the virtual image of reflected light in the monitor is higher magnitude than many real features.

For the second question: The newton step should already be included in the A step update, this is why it keeps track of the values around the minimum and then produces a non-integer step. It is solving for the minimum of the best fit parabola. There is still some quantization related error though. I have found this is actually less on real video, I assume because the focus of most real cameras is worse (more gaussian) than the VaFRIC data.

AFAIK the only part of DTAM not implemented is the accelerated exhaustive search. You could do this with some fairly simple math during the A step update, but I just didn't get around to it.

On Sun, Oct 2, 2016 at 11:03 AM, ShuangLiu1992 notifications@github.com wrote:

also, could somebody please shed some light on how to perform the newton step with the data structure of openDTAM to get subpixel level result?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/anuranbaka/OpenDTAM/issues/51#issuecomment-250975727, or mute the thread https://github.com/notifications/unsubscribe-auth/AEhWi4fYUoRtlHudN54VJ7sHaMlBuAueks5qv8e_gaJpZM4KMB5S .

ShuangLiu1992 commented 7 years ago

Thank you for your reply! I also noticed the artefacts on the monitor in the video demo of the DTAM paper, but there doesn't seem to be any specular shine on the monitor in the these particular synthetic images provided in the link?

anuranbaka commented 7 years ago

That's odd, I remember the top of the printer specifically having a problem in that dataset, but I don't remember for sure on the monitor. You could check it in a 16 bit image editor. If there's actually no specular data there, then something is seriously wrong in the optimizer.

Also, your solution looks mirrored to me, is there a reason for that?

-Paul

On Sun, Oct 2, 2016 at 2:59 PM, ShuangLiu1992 notifications@github.com wrote:

Thank you for your reply! I also noticed the artefacts on the monitor in the video demo of the DTAM paper, but there doesn't seem to be specular shine on the monitor in the these particular synthetic images provided in the link?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/anuranbaka/OpenDTAM/issues/51#issuecomment-250988497, or mute the thread https://github.com/notifications/unsubscribe-auth/AEhWiwLQJeAvFhnWOVrC7JNRX5rk9lrJks5qv_8vgaJpZM4KMB5S .

ShuangLiu1992 commented 7 years ago

Right now the code in the repo doesn't support opencv 3.0, so I rewrote some parts of to make it compatible with opencv 3.0, the reason solution looks mirrored is might be because I flipped it when saving it as .ply or .obj files.

turns out the weird result might be due to I dropped the third parameter when loading the image, because I didn't understand (still don't understand) what it was doing. After adding the third parameter back the result looks more acceptable now, but the monitor still isn't a very flat surface.

imread(png[imageNumber].string(), cv::IMREAD_UNCHANGED).convertTo(image, CV_32FC3, 1.0 / range, 1.0 / 255);

screenshot from 2016-10-02 21-28-48 screenshot from 2016-10-02 21-29-09

Also, what's the projection formula in openDTAM? I want to be able to convert other pipeline's camera and rotation matrix, translation vector to openDTAM format for testing. e.g. openMVG, openMVS, other SLAM tracking such as orbSLAM, LSD_SLAM etc

anuranbaka commented 7 years ago

I suspect the reason for the 1.0/255 is to avoid having totally black regions match the out of bounds border fill that the gpu uses when making the cost volume, to avoid this specific type of problem. But it has been a long time since I wrote that, so I'm not sure.

I tried to make the external interfaces for OpenDTAM follow the conventions for the opencv calib module, described here http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html. Internally, it uses the x, y from the keyframe, and inverse depth. The internal inverse depth is scaled so that the center of the first layer of voxels corresponds to the far plane (since far<near in inverse depth) and the center of the last layer corresponds to the near plane. This is all calculated for you. In the end, you can get the world [x;y;z;w] from CostVolume.projection.inv()*[col;row;layer;1.0], with the usual perspective divide.

On Sun, Oct 2, 2016 at 4:33 PM, ShuangLiu1992 notifications@github.com wrote:

Right now the code in the repo doesn't support opencv 3.0, so I rewrited some parts of to make it compatible with opencv 3.0, the reason solution looks mirrored is might be because I flipped it when saving it as .ply or .obj files.

turns out the weird result might be due to I dropped the third parameter when loading the image (by the way imread(path, -1) doesn't do what it is supposed to do in 3.0 anymore), because I didn't understand (still don't understand) what it was doing. After adding the third parameter back the result looks more acceptable now, but the monitor still isn't a very flat surface.

imread(png[imageNumber].string(), cv::IMREAD_UNCHANGED).convertTo(image, CV_32FC3, 1.0 / range, 1.0 / 255);

[image: screenshot from 2016-10-02 21-28-48] https://cloud.githubusercontent.com/assets/11735658/19023497/57c1c8fc-88e7-11e6-820b-fd990a87ffde.png [image: screenshot from 2016-10-02 21-29-09] https://cloud.githubusercontent.com/assets/11735658/19023498/57c1cdf2-88e7-11e6-8003-87b89a1cf613.png

Also, what's the projection formula in openDTAM? I want to be able to convert other pipeline's camera and rotation matrix, translation vector to openDTAM format for testing. e.g. openMVG, openMVS, other SLAM tracking such as orbSLAM, LSD_SLAM etc

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/anuranbaka/OpenDTAM/issues/51#issuecomment-250993928, or mute the thread https://github.com/notifications/unsubscribe-auth/AEhWiwXbAx_715dmueFUGZJmIo-LIFuDks5qwBUHgaJpZM4KMB5S .

ShuangLiu1992 commented 7 years ago

Getting weird result from converted camera matrix, is the camera matrix applied to the translation vector before it was added to the vertices?

isn't the projection formula supposed to be: projection = camera * (rotation * vertex + translation) / z?

it seems in openDTAM the formula is: projection = (camera * rotation * vertex + translation) / z?

Say I have got a bunch of images and their corresponding camera position and rotation in the format of openMVG, I don't know near, far and depth step, how can I convert the camera parameters to openDTAM format?

In theory, I can just replace the following code:

float wi = p.data[8] * xf + p.data[9] * yf + p.data[11]; float xi = (p.data[0] * xf + p.data[1] * yf + p.data[3]); float yi = (p.data[4] * xf + p.data[5] * yf + p.data[7]); float minv = 1000.0, maxv = 0.0; float mini = 0; for (unsigned int z = 0; z < layers; z++) { float c0 = cdata[offset + z * layerStep]; float w = hdata[offset + z * layerStep]; float wiz = wi + p.data[10] * z; float xiz = xi + p.data[2] * z; float yiz = yi + p.data[6] * z; float4 c = tex2D(tex, xiz / wiz, yiz / wiz);

in CostVolume.cu with my own projection formula, and the denoiser and optimizer would still work, is that right?

anuranbaka commented 7 years ago

OpenDTAM does use: camera * (rotation * vertex + translation) / z <--this is real z but only for x and y.

The trick is that the third coordinate of OpenDTAM internally is not real z. It is given by: OpenDTAM_z = (1/real_z-(1/far))(1/near-1/far)*(num_layers - 1) . I used a bit of a math trick to get all that to work in the OpenDTAM projection matrix without having to do extra divides for each pixel.

The reason for all this work is that for stereo the real z depth is not a natural measure. The natural measure is 1/z (i.e. the ideal estimator for stereo is heteroskedastic in z, but homoskedastic in 1/z). I then just do a linear transformation on 1/z to make it range from [0, number_of_layers_in_cost_volume-1]

You can use real z for the projection if you like, but the results are worse if the range of depths being solved for is an appreciable fraction of the distance to the nearest depth (e.g. if you reconstruct things between 9 and 10m from the camera, then real z will probably work fine, but if you try to reconstruct things from 5 to 20m, then it will be hard to get the whole cost volume to denoise properly, and the Newton step will be biased. Worse, if you try to use 5m to infinity then real z doesn't even make sense).

In the stereo literature they say that disparity and not depth is a natural measure. Disparity is proportional to 1/depth, so it is basically the same argument as above. OpenMVP and other feature based approaches don't have to worry about this because they are solving for minimal residuals in the image plane rather than in 3-D, which automatically removes the heteroskedasticity issues.

The denoiser is unaffected by all of this. The optimizer will still work with real z, but will produce biased results.

ShuangLiu1992 commented 7 years ago

Thank you so much for your thorough explanation, I will try it. For me I already have the vertices, rotation and translation of an object in the scene, is there someway to validate that the combination of my rotation, translation and camera input is projecting the vertices to the right screen coordinate in openDTAM?

Also, since I only need a not so accurate depth map to work with, do you think DTAM will be definitely faster, or more suitable than other multi view stereo algorithm, for example patch match stereo?

Can I email you privately about my idea/use case of openDTAM and discuss some technical details with you? Don't know if you are still working closely with academia but I'm trying to write a paper on facial SLAM, maybe we could work together?

anuranbaka commented 7 years ago

That sounds interesting. I don't know how much time I have for actual work on it, but I can certainly give advice.

As for writing a paper, I'm good at editing but very bad at writing papers (my thoughts don't really go in order), which is a lot of why I'm not a PhD anymore.

Anyway, you could email me at com.soartech@foster.paul <-words reversed to avoid bots -Paul

On Tue, Oct 4, 2016 at 4:45 AM, ShuangLiu1992 notifications@github.com wrote:

Thank you so much for your thorough explanation, I will try it. Can I email you privately about my idea/use case of openDTAM and discuss some technical details with you? Don't know if you are still working closely with academia but I'm trying to write a paper on real facial SLAM, maybe we could work together?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/anuranbaka/OpenDTAM/issues/51#issuecomment-251329938, or mute the thread https://github.com/notifications/unsubscribe-auth/AEhWi2eXhufxdGo5BJI5vhH5WHE34yK2ks5qwhI6gaJpZM4KMB5S .

ShuangLiu1992 commented 7 years ago

Just emailed you, please let me know if you have received it. Thank you! -Shuang

melights commented 7 years ago

Hi @ShuangLiu1992 ,

Your reconstruction looks amazing. Is it possible to share the way of 3D reconstruction you used to me?

Many thanks, Melights

nonlinear1 commented 4 years ago

@ShuangLiu1992 Do you have solve your problem?