PiLab-CAU / ComputerVision-2401

Computer Vision Course 2024-01
Apache License 2.0
9 stars 3 forks source link

[Lecture3][0321]Questions about LoG, DoG #5

Closed choigiheon closed 4 months ago

choigiheon commented 8 months ago

@yjyoo3312, I have two questions about Laplacian of Gaussian (LoG)!

Screenshot 2024-03-21 at 13 47 07

1) When using LoG, if edge detection is based on zero-crossings, how do we differentiate between case 1 and case 2 in Figure, which are both zero-crossing but one is on edge and the other is not?

2) I heard that the reason for approximating LoG with DoG is due to computational complexity. However, since convolutional filters are fixed in advance, it doesn't seem necessary to worry about computation. Are there cases where LoG is computed multiple times?

Thank you very much!

yjyoo3312 commented 8 months ago

@choigiheon (just call me @yjyoo3312 in this repo:) ) Thank you for the constructive questions!

  1. Addressing the first issue, we can examine neighboring pixels around zero crossings to provide a solution. While not perfect (yet quick and accurate in most cases), the example convolutional kernel could be considered, as:
filter = np.array([
    [1, 1, 1],
    [1, 0, 1],
    [1, 1, 1]
])
  1. The Difference of Gaussians (DoG) is broadly utilized because keeping a collection of Gaussian-blurred images, known as a Gaussian Pyramid, is a standard practice in Computer Vision for a diverse range of applications, https://github.com/PiLab-CAU/ComputerVision-2401/blob/5ca29bcaa331732d9ac8bb3ee6b91318f623665c/lecture4/example_SIFT_HoG.py#L23 such as super resolution, and keypoint detection. It is because this technique is employed to capture images across multiple scales . Often, as illustrated in the reference below, the pyramid of images subtracting two different level is referred to as a Laplacian pyramid, although it is not precisely the same as a Laplacian of Gaussian. To summarize, in numerous computer vision applications, we typically maintain a pyramid of images blurred at varying levels. By subtracting these images, we obtain an approximation of Laplacian of Gaussian (LoG) images, rather than applying the exact Laplacian kernel directly.
    • The term 'Laplacian pyramid' is often used to describe the structure resulting from the subtraction of two images that have been smoothed differently, especially when these images are subsequently downsampled and rescaled. (It's important to note that this subtraction process approximates the LoG operation and is not exact.)
    • The term 'DoG pyramid' is used when referring to a pyramid where each level results from subtracting two images that have undergone Gaussian smoothing with different sigma values. (actually, the term Laplacian pyramid is much widely used)
    • Of course, both terms, in many case, are used without clear distinction in our field, like convolution and correlation :)

[References] https://arxiv.org/pdf/1506.05751v1.pdf <- this one is a classical GAN paper, applying the Laplacian Pyramid. https://sepwww.stanford.edu/data/media/public/sep/morgan/texturematch/paper_html/node3.html

choigiheon commented 8 months ago

I think I roughly understand, but I'm not sure. Is the reason for not directly applying the LoG filter and instead subtracting two different Gaussian-blurred images because the collection of Gaussian-blurred images is used for a different purpose other than LoG?

And if you want to apply only the LoG operation, would applying the LoG filter directly be the better approach?

yjyoo3312 commented 8 months ago

Yepp, indeed. Since creating a Gaussian image pyramid is a common practice for various computer vision applications, subtraction is an efficient choice to obtain a LoG image for those cases. Nonetheless, when the task requires applying a LoG operation to an image directly, utilizing a LoG kernel also serves as a good alternative.

choigiheon commented 8 months ago

Ok, I got it! Thank you for answering!!! :D