correspondance between code and equation

JunMa11 commented 10 months ago

Hi @alvarogonjim ,

Thanks for the amazing work and congrats on the best paper award.

It seems that the code format doesn't align with the equation. Could you please reformat the code to match the equation?

BTW, I also added your great work to the loss odyssey repo: https://github.com/JunMa11/SegLoss

JunMa11 commented 10 months ago

Could you please also explain this variable a little bit?

https://github.com/Digital-Dermatology/t-loss/blob/37f6b7891db4629d5b9bd2125b31ed58e161bd8c/tloss.py#L87

alvarogonjim commented 10 months ago

Thank you, @JunMa11, for your interest in our work and for including T-Loss in the odyssey collection. In response to your question, our implementation follows a more general formulation where $\mathbf{\Sigma}$ is an arbitrary diagonal matrix. Since this is not used in the paper, the formulation there is further specified to the case of the identity matrix.

Equation (2) in the paper can be rewritten as:

$$ -\log p(\mathbf{y}| \boldsymbol{\mu}, \mathbf{\Sigma}; \nu) = -\log \Gamma\left(\frac{\nu+D}{2}\right) +\log \Gamma\left(\frac{\nu}{2}\right) +\frac{1}{2} \log |\mathbf{\Sigma}| +\frac{D}{2} \log \pi +\frac{D}{2} \log \nu +\frac{\nu+D}{2} \log \left( 1+\frac{\left(\boldsymbol{y}-\boldsymbol{\mu}\right)^T \mathbf{\Sigma}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}\right)}{\nu} \right), $$

using $\log(\pi\nu) = \log\pi + \log\nu$. When $\mathbf{\Sigma}$ is a generic diagonal matrix with positive eigenvalues, we can rewrite $\mathbf{\Sigma}$ as $\mathbf{\Sigma} = diag(e^{\lambda_d} + \epsilon)$ with $d=1,\ldots,D$, i.e., $\lambda_d$ are the diagonal elements of $\mathbf{\Sigma}$.

Since $\mathbf{\Sigma} = diag(e^{\lambdad} + \epsilon)$, its determinant is easily computed as a product: $|\mathbf{\Sigma}| = \prod{d=1}^D (e^{\lambdad} + \epsilon)$. Therefore, $\log|\mathbf{\Sigma}| = \sum{d=1}^D \lambda_d$, and the safeguard $\epsilon$, which prevented logarithms of numerically small numbers, is no longer needed.
The dot product $\left(\boldsymbol{y}-\boldsymbol{\mu}\right)^T \mathbf{\Sigma}\left(\boldsymbol{y}-\boldsymbol{\mu}\right)$ becomes a sum of square elements of $\left(\boldsymbol{y}-\boldsymbol{\mu}\right)$ multiplied by the corresponding diagonal elements of $\mathbf{\Sigma}$, i.e., $$\sum_{d=1}^D(e^{\lambda_d}+\epsilon)(\boldsymbol{y}-\boldsymbol{\mu})_d^2$$.

Performing these substitutions in the previous equation gives:

$$ -\log p(\mathbf{y}| \boldsymbol{\mu}, \boldsymbol{\lambda}; \nu) = -\log \Gamma\left(\frac{\nu+D}{2}\right) +\log \Gamma\left(\frac{\nu}{2}\right) +\frac{1}{2} \sum_{d=1}^D \lambdad +\frac{D}{2} \log \pi +\frac{D}{2} \log \nu +\frac{\nu+D}{2} \log \left(1+\frac{\sum{d=1}^D (e^{\lambda_d}+\epsilon) (\boldsymbol{y}-\boldsymbol{\mu})_d^2}{\nu}\right). $$

This corresponds to the expression implemented in the code, with the exception of the third term in the first line, which in our script reads:

$$ \frac{1}{2} \sum_{d=1}^D (\lambda_d + \epsilon). $$

The mismatch is due to a typo, but it is numerically very small and furthermore constant during optimization. Nevertheless, since it was included in the runs for our paper, we decided to leave it as is for reproducibility. For all our experiments, we consider $\lambda_d = 1$ for $d=1,\ldots,D$, which is equivalent to $\mathbf{I}_D$ as reported in the paper. We will add a note in the code linking to this issue to avoid any future confusion.

JunMa11 commented 10 months ago

Thank you very much for the great explanation.

Digital-Dermatology / t-loss

correspondance between code and equation #2