RQ-Wu / RIDCP_dehazing

[CVPR 2023] | RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors
https://rq-wu.github.io/projects/RIDCP/index.html
Other
194 stars 23 forks source link

Confusion arises regarding some details of the paper. #2

Closed zhuyr97 closed 1 year ago

zhuyr97 commented 1 year ago

(1) It is unclear how to calcuate the Code activation frequencies in Figure 4.

(2) The paper mentions using a binary search algorithm to iteratively find the approximate optimal solution for αˆ, but it is not clear how to obtain αˆ with this algorithm.

RQ-Wu commented 1 year ago

It is really confusing here, I will make some details about CHM clear here. (1) The first preliminary knowledge is that in our network, we match and replace the features extracted by the encoder with the code in the codebook. When a code is matched, we call this an ’activation‘. We count the number of times the network matches different codes when processing images, and the ratio to the total number of matches is called the ’activation frequency‘.

(2) ps: For the convenience of description, the $\alpha$ in the paper is a positive number. But when the code was implemented, we added a sign before the formula, so the $\alpha$ in the figure and below is a negative number. You can interpret all negative numbers that appear below as positive numbers.

(1) We first assume that the KL divergence is a convex function with $\alpha$, and call this function $f(\alpha)$ (2) First iterate with a step size of 5. We found that $f(-20) < f(25), f(-20) < f(-15)$, so $\hat \alpha \in (-25, -15)$ (According to the existence theorem of zero point of continuous function, there is a zero point for the derivative of $f(x)$ in $(-25, -15)$ ) (3) We reduce the search step to half ($2.5$) and then find $\hat \alpha \in (-25, -20)$. Besides, we do not need to search in $(-20, -15)$. (4) Similarly, we finally find $\hat \alpha = -21.25$. image

zhuyr97 commented 1 year ago

Thank you for your detailed responses and for addressing my confusions.

According to the original paper, the computation of $\hat{\alpha}$ is based on the features of a pre-trained VQGAN with 200 hazy/clean image pairs.

(i) Will the final value of $\hat{\alpha}$ be slightly different if the input images are changed?

(ii) When I adjust the $\hat{\alpha}$ to adapt to my preference, am I still using the model obtained from training with the default value of 21.25?

RQ-Wu commented 1 year ago

(1) Yes. Input images and random perturbation for training will influence the default value. (2) You can adjust $\hat \alpha$ as you like.

RQ-Wu commented 1 year ago

Moreover, we do not use image pairs. The computation of $\hat \alpha$ is based on the features of a pre-trained VQGAN with 200 clean images on the high-quality dataset (e.g., Flickr) and the features of the dehazing network with 200 images in the real hazy dataset (eg. RTTS).

We hope that the distribution of dehazing results and high-quality data can be as close as possible.

zhuyr97 commented 1 year ago

Thanks!