Closed cwognum closed 2 years ago
Ah wait, I think I see it now. It's because I should be computing the column-wise covariance, not row-wise. So that gives:
cov_torch = torch.cov(phis[0].T)
# tensor([[0.0000, 0.0000],
# [0.0000, 0.5000]])
I'm still not 100% sure why both the first and second moment are matched rather than just the second, but I can see why that would be important. Closing the issue for now.
Following equation (1), (2), and (3) in the paper, I am not sure if this is the same as the current implementation.
Equations from paper
Code example
The output is even more different for the second domain
Putting it all together, the resulting penalties differ quite significantly too:
And then I'm not even considering the mean_diff, which also does not seem to be mentioned in the paper.
I can imagine that I'm missing something here. Could you elaborate if the deviation from the paper is intentional and if so, why?