Log-Variance from a ReLU activation?

alexlee-gk / video_prediction

Stochastic Adversarial Video Prediction

https://alexlee-gk.github.io/video_prediction/

MIT License

303 stars 65 forks source link

Log-Variance from a ReLU activation? #3

Closed zaxliu closed 6 years ago

zaxliu commented 6 years ago

Hi Alex, I've read your excellent SAVP paper as well as the previous SV2P and CDNA papers and got one question about the SV2P implementation.

I notice in the VAE encoder tower function the default ReLU activation was used to compute a tensor called latent_std. Judging from this name I thought it is the standard deviation of the latent variables. However, when sampling from the latent distribution and computing the KL divergence, this tensor seems to be treated as the log-variance of the latent variable. Since log-variance may take both pos and neg values, ReLU is obviously not fitted as the output activation.

So why use ReLU here? Apologize if I miss anything in the code.

alexlee-gk commented 6 years ago

The SV2P implementation was taken from the author's implementation with minimal changes (including the naming of the tensors). The original SV2P implementation also uses the default ReLU activation.

The latent_std tensor in the SV2P model is indeed the log-variance (and not standard deviation). This tensor is shifted by latent_std_min, so the range of the log-variance are values greater than or equal to latent_std_min in the SV2P model. I also clip the log-variance in my model—the range of the log-variance are values between -10 and 10 in the SAVP model. It's common to lower-bound the log-variance to prevent the Gaussian from mode-collapsing during training. Notice that a log-variance of -10 is already quite small (around 4.5e-05).

zaxliu commented 6 years ago

Thanks @alexlee-gk for the clear explanation. BTW, do you experience mode-collapsing a lot when the log-var is not clipped?

alexlee-gk commented 6 years ago

I used to get NaNs in an earlier implementation, though I haven’t checked if that’s still the case with the current implementation. If you try it, I’d be interested to know the results.

zaxliu commented 6 years ago

Thanks, will let you know if I get any results.