PiLab-CAU / ComputerVision-2401

Computer Vision Course 2024-01
Apache License 2.0
9 stars 3 forks source link

[Lecture4][0412] 4 Questions about Harris Corner, SIFT #14

Closed choigiheon closed 4 months ago

choigiheon commented 7 months ago

@yjyoo3312, I have 4 questions about Harris Corner and SIFT.

  1. This question is about The issue #5 . I want to double-check if my understanding is correct. There are two ways to blur an image: applying Gaussian blur or performing downsampling and resampling subsequently. If blurring is done using the former method, it results in a Gaussian pyramid, while using the latter method results in a Laplacian pyramid, is it right?

  2. image
image

It seems like the meaning of "scale" differs between the first and second images. In the first image, it appears to represent octaves, while in the second image, it seems to represent layers, i.e. different sigmas. Are both concepts related to the term "scale" and thus referred to as "scale"?

3. image

SIFT does not use Harris Corner Detector on the 'space' axis; instead, it employs DoG (Difference of Gaussians). As learned in the previous lecture, DoG approximates the Laplacian. Therefore, does SIFT treat the local maxima of the Laplacian as keypoints? And is it reasonable? I'm not sure what the local maxima of the Laplacian means.

  1. In SIFT, has DoG on the 'scale' axis been replaced by Scale Space Extrema?

Thank you for reading my lengthy text!

yjyoo3312 commented 6 months ago

@choigiheon Thank you for the summary! very helpful in clarifying the explanations.

  1. Yes, the critical aspect of the Laplacian pyramid is that it stores the difference between blurred images, approximating a Laplacian-filtered image. The Laplacian pyramid often contains these subtracted images resulting from downsampled and resampled processes, whereas the Difference of Gaussian (DoG) pyramid stores differences between images blurred at varying Gaussian scales. The DoG pyramid is also referred to by many as the Laplacian of Gaussian (LoG) pyramid.

  2. That's a valid concern. In terms of scale-space extrema, each point checks against 27 neighbors from DoG images at preceding and subsequent intervals within the same octave. Here, 'scale' refers to the level of blurriness dictated by different sigma values in images of the same resolution within the same octave. A blurrier image is an approximation of a more downscaled image.

  3. Correct, the SIFT algorithm primarily identifies key points from LoG images, which can also be approximated by DoG images. The shape of the Laplacian (or DoG) kernel is crucial here; its maximum response typically corresponds to corner-like features. In 1D, it detects 1D discontinuities, and in 2D, it identifies discontinuities along both the x and y axes, indicative of a corner.

  4. Yes, as mentioned in response to question 2, the DoG, derived from varying sigma levels, essentially defines the 'Scale' in Figure 3 for each octave.

choigiheon commented 6 months ago

Thank you for your kindness!! Now, I can understand what Harris Corner and SIFT are.

Note : This is not a question, I just wanna share other students the result below.

Screenshot 2024-04-14 at 15 53 39

First: Directly applying Laplacian Kernel / Second: Laplacian (or DoG) in Laplacian Pyramid / Third: Laplacian (or DoG) in DoG Pyramid

This result was different from my expectation. I thought that the Laplacian Pyramid cannot approximate Laplacian (or DoG). Because the Laplacian Pyramid's downsampling and resampling don't include Gaussian filters, but Gaussian filters are essential to DoG.

However, the three images are similar! The reason is that in OpenCV, downsampling and resampling are done by cv2.pyrDown() and cv2.pyrUp(), respectively. Both methods apply a Gaussian filter first, and then reshape a image. I think the results would be different if a implementation of downsampling or resampling doesn't include Gaussian Blur.