Mathematical Formulation of Objective Function

Schwartz-Zha commented 3 years ago

    def normal_flow(self, x, y_onehot):
        pixels = thops.pixels(x)
        z = x + torch.normal(mean=torch.zeros_like(x),
                             std=torch.ones_like(x) * (1. / 256.)) #original input added with a small noise
        logdet = torch.zeros_like(x[:, 0, 0, 0])
        logdet += float(-np.log(256.) * pixels)   # ??? How should we add this here?
        # encode
        z, objective = self.flow(z, logdet=logdet, reverse=False)
        # prior
        mean, logs = self.prior(y_onehot)
        objective += modules.GaussianDiag.logp(mean, logs, z)

        if self.hparams.Glow.y_condition:
            y_logits = self.project_class(z.mean(2).mean(2))
        else:
            y_logits = None

        # return
        nll = (-objective) / float(np.log(2.) * pixels)
        return z, nll, y_logits

This normal_flow() function is the core forward propagation part of Glow. "nll" here is just passed to this line in trainer.py

loss_generative = Glow.loss_generative(nll)

And Glow.loss_generative is just a static function (just take the mean)

@staticmethod
    def loss_generative(nll):
        # Generative loss
        return torch.mean(nll)

So basically nll is just the loss. Then move our attention to "objective", from my understanding of each nn.Module in this project, it's the sum of log determinant of each nn.Module .

Then let's take a broad view of the whole model. We start from z (input x added with some noise), go through many transformations $T=[T_1, T_2\dots]$ So, the final output $o =T(z)= \dots(T_2(T_1(z)))$

Suppose p() represents the likelihood of each instance, and x represents a real image from the dataset, the object of this generative model should be $\Sigma_{i=1}^np(x_i) =\Sigma_{i=1}^np(z_i)|det(\nabla_zT(z_i))|^{-1}$ The right hand side is always smaller than the left side, so the objective is just to enlarge the rhs. So the objective function of optimization is $\hat{T} = argmin\Bigg\{ \Sigma_{i=1}^n\Big\{ log\Big|det(\nabla_zT(z_i))\Big| -log(p(z_i)) \Big\} \Bigg\}$

But the computation of "nll" shows that it's actually $argmin\Bigg\{ \Sigma_{i=1}^n\Big\{- log\Big|det(\nabla_zT(z_i))\Big| -log(p(z_i)) \Big\} \Bigg\}$ Is this really correct?

And by the way, what's the point of this line?

logdet += float(-np.log(256.) * pixels)

chaiyujin commented 3 years ago

As far as I am concerned, the GLOW (or normalization flow) model try to max likelihood of training data, which means max the log probability (the var objective in my code) and thus means min the -objective.

logdet += float(-np.log(256.) * pixels)

About this line, I cannot recall the details. It is supposed to handle some extra multiplication factor out of det().

-np.log(256.) is used to approx intergration for discrete pixel values.
pixels handles the (width, height) dimension of input.

Schwartz-Zha commented 3 years ago

Well, from my reading in another paper "Generative Flow via n * n Convolution: Thanh-Dat Truong et al.". The data is added with a random uniform noise, which is the discretization level of the data, making the input continuous.

Could you get some hint? (I'm still confused...)

HuangChiEn commented 7 months ago

float(-np.log(256.) * pixels)

discretized level is a number denote the range of 8 bit pixel (2^8=256). So, glow paper mentioned c=-M x log a, where a=256 in Eq(2), while M is the number of total pixels in your image (i.e. 32 32 image, `M=3232`).

HuangChiEn commented 7 months ago

@chaiyujin i wonder why the objectives will need to divided by float(np.log(2.) * pixels ? what is that means ?

chaiyujin / glow-pytorch

Mathematical Formulation of Objective Function #24