RoboticsIIITH / summer-sessions-2019

28 stars 10 forks source link

Assignment-3: Two-view reconstruction #10

Open karnikram opened 5 years ago

karnikram commented 5 years ago

Task:

To reconstruct the sparse structure of a scene from two given images of it.

assign-3-sample

Steps:

Files:

The accompanying files can be found here. Matlab (with the computer vision toolbox) is recommended for this assignment since it is easier to get started with.

Deliverables:

Email your results to karnikram@gmail.com and ansariahmedjunaid@gmail.com.

Deadline:

Wednesday, the 29th.

Use this thread for any doubts you might have.

aadilmehdis commented 5 years ago

Can you please explain the construction of the Normalization Transform Matrix T.

Currently, with the definition of T, it seems like we are shrinking the image coordinates in the range [-root(2), root(2)], instead of [-1, 1]. Could you explain why we are doing this?

Moreover, the scaling factor is the same along x and y directions. However, if we are normalizing the points, it should have different for x and y directions. Possibly the average width and height of the image. Could you explain why we are taking the Euclidean distance of the image points as well?

TIA

karnikram commented 5 years ago

Good questions.

Currently, with the definition of T, it seems like we are shrinking the image coordinates in the range [-root(2), root(2)], instead of [-1, 1]. Could you explain why we are doing this?

This normalization matrix first translates all the image coordinates so that they're centered around the origin, and then applies a scaling so that the average distance of a point from the origin is sqrt(2). This means that an average point is equal to (1,1,1). This is desirable because this means that each of the entries in the A matrix will also have similar magnitude. And since in DLT we are in a way minimizing A, we make sure that adjusting every entry will have similar effect on the image points, and is not skewed by some entries. This way the algorithm becomes more stable.

This point is explained in much more detail in this paper by Hartley.

Moreover, the scaling factor is the same along x and y directions. However, if we are normalizing the points, it should have different for x and y directions. Possibly the average width and height of the image.

In the paper he also shows that applying a non-isotropic scaling (different factors for x and y directions) actually has little effect on the results.

Could you explain why we are taking the Euclidean distance of the image points as well?

I don't have an answer to why we take l-2 norm and why not any other norm.