Closed zaxliu closed 6 years ago
The SV2P implementation was taken from the author's implementation with minimal changes (including the naming of the tensors). The original SV2P implementation also uses the default ReLU activation.
The latent_std
tensor in the SV2P model is indeed the log-variance (and not standard deviation). This tensor is shifted by latent_std_min
, so the range of the log-variance are values greater than or equal to latent_std_min
in the SV2P model. I also clip the log-variance in my model—the range of the log-variance are values between -10 and 10 in the SAVP model. It's common to lower-bound the log-variance to prevent the Gaussian from mode-collapsing during training. Notice that a log-variance of -10 is already quite small (around 4.5e-05).
Thanks @alexlee-gk for the clear explanation. BTW, do you experience mode-collapsing a lot when the log-var is not clipped?
I used to get NaNs in an earlier implementation, though I haven’t checked if that’s still the case with the current implementation. If you try it, I’d be interested to know the results.
Thanks, will let you know if I get any results.
Hi Alex, I've read your excellent SAVP paper as well as the previous SV2P and CDNA papers and got one question about the SV2P implementation.
I notice in the VAE encoder tower function the default ReLU activation was used to compute a tensor called
latent_std
. Judging from this name I thought it is the standard deviation of the latent variables. However, when sampling from the latent distribution and computing the KL divergence, this tensor seems to be treated as the log-variance of the latent variable. Since log-variance may take both pos and neg values, ReLU is obviously not fitted as the output activation.So why use ReLU here? Apologize if I miss anything in the code.