independence_penalty - Githubissues

severous commented 1 year ago

@Shai128 I would like to know what the following lines of code in independence_penalty function do

  partial_interval_sizes = interval_sizes[abs(torch.min(is_in_interval, dim=1)[0] -
                                              torch.max(is_in_interval, dim=1)[0]) > 0.05, :]
  partial_is_in_interval = is_in_interval[abs(torch.min(is_in_interval, dim=1)[0] -
                                              torch.max(is_in_interval, dim=1)[0]) > 0.05, :]

Also, I would like to know the reason for setting data_size_for_hsic in the HSIC calculation

        data_size_for_hsic = 512
        for i in range(partial_is_in_interval.shape[0]):

            v = partial_is_in_interval[i, :].reshape((n,1))
            l = partial_interval_sizes[i, :].reshape((n,1))
            v = v[:data_size_for_hsic]
            l = l[:data_size_for_hsic]
            if torch.max(v) - torch.min(v) > 0.05:  # in order to not get hsic = 0
                curr_hsic = torch.abs(torch.sqrt(HSIC(v, l)))
            else:
                curr_hsic = 0

Shai128 commented 1 year ago

Great questions!

These lines consider only vectors of length/coverage for which the coverage vector contains at least one coverage and one miscoverage events.
The parameter 'data_size_for_hsic' defines a maximal sample size used to comoute the HSIC. For example, if the batch size is 1024, the HSIC penalty is computed only on 512 samples. The reason for this is to reduce computational complexity.

I hope you find this answer helpful!

severous commented 1 year ago

Thanks for your help. :)

Shai128 / oqr

independence_penalty #1