Also, I would like to know the reason for setting data_size_for_hsic in the HSIC calculation
data_size_for_hsic = 512
for i in range(partial_is_in_interval.shape[0]):
v = partial_is_in_interval[i, :].reshape((n,1))
l = partial_interval_sizes[i, :].reshape((n,1))
v = v[:data_size_for_hsic]
l = l[:data_size_for_hsic]
if torch.max(v) - torch.min(v) > 0.05: # in order to not get hsic = 0
curr_hsic = torch.abs(torch.sqrt(HSIC(v, l)))
else:
curr_hsic = 0
These lines consider only vectors of length/coverage for which the coverage vector contains at least one coverage and one miscoverage events.
The parameter 'data_size_for_hsic' defines a maximal sample size used to comoute the HSIC. For example, if the batch size is 1024, the HSIC penalty is computed only on 512 samples. The reason for this is to reduce computational complexity.
@Shai128 I would like to know what the following lines of code in
independence_penalty
function doAlso, I would like to know the reason for setting
data_size_for_hsic
in the HSIC calculation