ANM, lack of Gamma in "Gamma HSIC"

ArnoVel commented 4 years ago

Hi, This might simply be a conceptual problem, or a lack of knowledge on my part. Usually, using HSIC to compare two ANM candidates can be done by comparing the statistics directly, or by computing the related p-value. However, to compute a p-value one needs to have some notion of the HSIC distribution under the null. The classic paper from Gretton et al. proposes a Gamma Approximation by giving specific plug-in values for the two Gamma parameters in terms of the expectation and variance of the HSIC; If I had to compute the p-value myself, I would use the above approximation for the gamma distribution, and then use the gamma CDF parametrized by the above values.

I am aware there might be other ways to do such a thing, however your snipper in the anm method does not seem to compute p-values, but only test statistics. While this might be wrong, the variable names as well as the description of the method suggests this.

Am I wrong? Right? If either, how so?

Thanks for any additionnal information on this topic, I would ideally like to design a test which detects whenever a model satisfies an ANM with low Type I and II error.

ArnoVel commented 4 years ago

For future reference: this test essentially compares the test statistics m*HSIC_b, it is called in this way not because the Gamma approximation is used, but because the gamma approximation is used on the same quantity (m*HSIC_b) in the reference paper.

diviyank commented 4 years ago

Hi, You are correct: Only the test statistic is computed, and not the p-value. (ref: authors' code here: http://web.math.ku.dk/~peters/code.html). We might want to include the p-value computation, at least for information for users.

Feel free to make a pull request ; it might take some time before I could look into it. Best regards, Diviyan

ArnoVel commented 4 years ago

Hi, I am a little bit busy atm, however I can point to two possible sources for an easy implementation:

a python copy of the original Gretton et al. matlab code, this uses numpy and vectorises on cpu only.
my pytorch (gpu compatible) update this however resorts to scipy for the inverse cdf, so while most of the computations can be performed on gpu, there's a limitation there. Also I have a nonstandard way to specify kernels, but that can be changed easily!

diviyank commented 4 years ago

Hi, Thanks ! I'll look into it Best regards, Diviyan

FenTechSolutions / CausalDiscoveryToolbox

ANM, lack of Gamma in "Gamma HSIC" #58