danruod / IEDL

[ICML 2023] Offical implementation of the paper "Uncertainty Estimation by Fisher Information-based Evidential Deep Learning".
MIT License
32 stars 4 forks source link

Seek help to understand an assumption in Section 3.1 #1

Closed MengyuanChen21 closed 1 year ago

MengyuanChen21 commented 1 year ago

Dear Authors, I would like to express my admiration for the extraordinary work you have done, which has contributed significantly to the field. Your detailed research and insightful analysis are greatly appreciated.

Upon careful reading of your esteemed publication, I came across a section that elicited a few questions regarding its underpinnings. In Section 3.1, there is a reference to the work of Sensoy et al., (2018), which suggests that EDL assumes the observed labels, denoted by $y$, to be independent and identically distributed from an isotropic Gaussian distribution, i.e., $y\sim\mathcal{N}(p,\sigma^2I)$, where $p\sim Dir(f_\theta(x)+1)$.

The aspect that I found perplexing relates to the encoding of $y$ as a one-hot vector, as stated in your paper. It is not entirely clear to me how a one-hot vector can adhere to a Gaussian distribution.

Furthermore, I noticed an apparent departure from the original work of Sensoy et al. As per my understanding, their research does not appear to propose any associations with an isotropic Gaussian distribution.

Could you kindly shed some light on this apparent discrepancy? I would greatly appreciate any insights you could provide on how the Gaussian distribution is tied to Sensoy et al.'s work, or if the assumption is a modification or extension of the original model that better suits your research objectives.

Thank you in advance for your time and consideration. I look forward to your valued response.

danruod commented 1 year ago

Dear Mengyuan,

Thank you very much for your attention and appreciation of our work. Here are my responses to your questions:

Hope the above content can answer your questions. Thank you again for your interest in our work. Please feel free to discuss if you have any other questions.

MengyuanChen21 commented 1 year ago

Thanks so much for the reply! Now I have a much clearer understanding about Section 3.1.

However, I still have a question about the application of the PAC-Bayesian bound in Section 3.3. It seems that, the last term in Theorem 3.1, $\Psi_{\mathcal{P},\pi}(\lambda,n)$ is omitted in Eq.(4). Could you shed some light on the rationale behind this particular omission? I have carefully read the supplementary material, but I am still confused about it.

Thanks again for your time and consideration!

danruod commented 1 year ago

The PAC-Bayesian bound (Theorem 3.1) is derived from the theorems of Germain et al. (2009), Alquier et al. (2016), Masegosa (2020). The omission of $\Psi{\mathcal{P},\pi}(\lambda,n)$ also stems from the conclusions of these references. More specifically, Section 4 of Alquier et al. (2016) and Section 3.2 of Masegosa (2020) both state that the term $\Psi{\mathcal{P},\pi}(\lambda,n)$ is constant w.r.t. $\rho$. Actually, this conclusion can be obtained directly from the equation of $\Psi_{\mathcal{P},\pi}(\lambda,n)$.

MengyuanChen21 commented 1 year ago

OK, I get it now. Thanks so much for the detailed reply! Thanks for your time and consideration!

danruod commented 1 year ago

OK, I get it now. Thanks so much for the detailed reply! Thanks for your time and consideration!

It's my pleasure~