Closed alextzik closed 2 years ago
I think they end up being the same because P(y,X | theta) = P(y | X, theta) P(X | theta) and P(X | theta) is uniform.
Here is another snippet from Guassian Processes for Machine Learning:
We don't control X in any way - we just want every X that we do have to be paired with a predictable y.
The distribution we are determining using MLE, is not a conditional, but rather a joint over y, with parameters X. It should thus be p(y:X, theta) and not p(y|X, theta), right?