Closed xuesongwang closed 3 years ago
Hi @xuesongwang thanks for the kind words but I don't completely understand what you are saying. Are you essentially suggesting to mask Y_trgt
but still use X_trgt
during evaluation ? If so I don't really see how that can help, the decoder already has access to X_trgt
...
Hi @xuesongwang thanks for the kind words but I don't completely understand what you are saying. Are you essentially suggesting to mask
Y_trgt
but still useX_trgt
during evaluation ? If so I don't really see how that can help, the decoder already has access toX_trgt
...
Thanks @YannDubs for pointing out. What I was trying to say is that in model evaluation, the encoder already has access to both X_trgt
and Y_trgt
by R_from_trgt = self.encode_globally(X_trgt, Y_trgt)
and then generates this z_sample (post distribution) for decoding afterwards. However, during testing this z_sample is obtained from prior distribution, R = self.encode_globally(X_cntxt, Y_cntxt)
due to lack of Y_trgt
, i.e., the ground truth. If there's a distributional gap between D(z_sample_post || z_sample_prior), the R and R_from_trgt will be different, resulting in an underperformed model on testing set. Hence, why not using prior distribution to save model instead? And to achieve this Y_trgt
can be set to None during evaluation so that the sampling_dist = q_zCc
is used
but the KL divergence that is used essentially ensure that the distributional gap is small. It's the same reason that you use the posterior during training of a VAE (i.e. condition on image) but only generate from the prior when evaluating a VAE.
Check out section 4.1 of ConvCNP paper for the derivations / theoretical explanations: https://arxiv.org/pdf/2007.01332.pdf
Dear Dubois, First of all, thanks for this amazing project! The NPF general framework is beautifully designed. However, I have one issue regarding model evaluation.
For latent based methods, you mentioned in the website that "when evaluating we will evaluate the log likelihood using posterior sampling". And based on the code below:
https://github.com/YannDubs/Neural-Process-Family/blob/892d0439614804ee671d66464fcb7d46ab43629b/npf/neuralproc/base.py#L500-L505
is_q_zCct =True and Y_target are given, the posterior distribution is used for inference.
But why not masking Y_target during evalution and saving model based on results sampled from prior distribution (R from context X, Y)? Would it be possible that a model learns a good decoder(posterior_z_samples, X_target) but the divergence between D(post_z || prior_z) is large? In that case the performance on testing set will be horrible.
Any insight will be appreciated. Thanks.