Open Akella17 opened 4 years ago
I'm somewhat confused -- why would the GP MLL loss have a num_data
shape? The MLL doesn't decompose as a sum over the individual data points or anything, so there's not really a good analogue of like a "per-data point" loss for GPs.
@jacobrgardner Thanks for the quick response. I agree with what you are saying. But what I don't understand is why is the loss non-zero for all the GP heads when num_data = 1
(only one GP head is used/gathered for computing the loss).
In other words, I was expecting the loss to be zero for all other GP heads except for the one that was gathered for computing the loss.
I am using the batch feature to deploy multiple independent GPs, but the loss can only be computed for one GP heads per training example. Here is how I implemented it:
However, the loss has
num_GP_heads
shape instead ofnum_data
shape. Moreover, even when thenum_data
is 1 (meaning that loss needs to be computed for only one of the GP heads), the loss isnum_data
dimensional and non zero for all GP heads. Is there anything wrong with my implementation?