FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html
MIT License
1.12k stars 198 forks source link

Is HSIC Lasso different from KCI ? #47

Closed ArnoVel closed 5 years ago

ArnoVel commented 5 years ago

Hello, I was wondering whether the independence test found in the HSIC Lasso script was the one introduced in the paper Kernel-based Conditional Independence Test and Application in Causal Discovery ? I'm asking this because they seem related (based on HSIC) but there's no citation in the code you provide, neither of the Gretton et Al. paper or the one I'm referring to.

If I had to guess, I would bet on a standard independence test, not conditional independence. Is that the case?

Thanks, Arno V.

ArnoVel commented 5 years ago

I was wondering whether the independence test found in the HSIC Lasso script was the one introduced in the paper Kernel-based Conditional Independence Test and Application in Causal Discovery ? I'm asking this because they seem related (based on HSIC) but there's no citation in the code you provide, neither of the Gretton et Al. paper or the one I'm referring to.

If I had to guess, I would bet on a standard independence test, not conditional independence. Is that the case?

After looking around, I suppose the KCI-test can be found here

Available heuristics for conditional independence tests:
        + gaussian: "pcalg::gaussCItest"
        + hsic: "kpcalg::kernelCItest"
        + discrete: "pcalg::disCItest"
        + binary: "pcalg::binCItest"
        + randomized: "RCIT:::CItest"
    Available CI tests:
        + dcc: "data=X, ic.method=\"dcc\""
        + hsic_gamma: "data=X, ic.method=\"hsic.gamma\""
        + hsic_perm: "data=X, ic.method=\"hsic.perm\""
        + hsic_clust: "data=X, ic.method=\"hsic.clust\""

through R packages? Some additional questions:

diviyank commented 5 years ago

The test used for the HSIC lasso is a regular independence test and not a conditional independence test, I'll try to add a reference.

The implementation of the PC-HSIC test is the one from the kpcalg package, thus refering to : G. Szekely, M. Rizzo and N. Bakirov (2007). Measuring and Testing Dependence by Correlationof Distances. The Annals of Statistics 2007, Vol. 35, No. 6, 2769-2794.A. Gretton et al. (2005). Kernel Methods for Measuring Independence. JMLR 6 (2005) 2075-2129.R. Tillman, A. Gretton and P. Spirtes (2009). Nonlinear directed acyclic structure learning withweakly additive noise model. NIPS 22, Vancouver

ArnoVel commented 5 years ago

Actually the IndepTest correspond to the 'Available heuristics for conditional independence tests:' and the 'Available CI test' corresponds to how the sufficient statistics are going to be computed. HSIC is computed with a heuristic to evaluate the null distribution, obtained by shuffling the samples, thus keeping the marginals and obtaining independent variables.

Thank you for this explanation, now things are clear :+1:

diviyank commented 5 years ago

Great, I will be closing this issue, don't hesitate to reopen it if you have more questions