cdt15 / lingam

Python package for causal discovery based on LiNGAM.
https://sites.google.com/view/sshimizu06/lingam
MIT License
356 stars 54 forks source link

Is the bootstrap method implemented here a multiscale one? #95

Open EsqYu opened 1 year ago

EsqYu commented 1 year ago

When using bootstrap to evaluate the causal structure given by LiNGAM, multiscale bootstrap is said to be better. Is the bootstrap method already prepared here a multiscale one?

sshimizu2006 commented 1 year ago

Hi, multiscale bootstrap is not implemented in this package for LiNGAM.

Though not specialized to LiNGAM, in general cases, R package for multi scale bootstrap is available: https://github.com/shimo-lab/scaleboot

EsqYu commented 1 year ago

Hi, thank you for your reply and introducing R package for multi scale bootstrap. Can I somehow use this package to evaluate the result LiNGAM gives?

sshimizu2006 commented 1 year ago

Yes. Basically, it would be something like computing bootstrap probabilities with different numbers of bootstrap resampling and then giving them to the R code.

EsqYu commented 1 year ago

Thanks for your reply. Does "computing bootstrap probabilities with different numbers of bootstrap resampling" mean execute the bootstrap() method with different number of n_sampling like the following? 1)model.bootstrap(X, n_sampling=100) 2) model.bootstrap(X, n_sampling=200) 3) model.bootstrap(X, n_sampling=300) 4)give the results1〜3 to R code

sshimizu2006 commented 1 year ago

Yes, something like that, though I don't know the details of the R package very much.

EsqYu commented 1 year ago

Thank you very much for your kind replies. I'll try using the package. By the way, I have one more question. I believe there are two options when selecting the model, 'pwling' or 'kernel'. How should I use these options differently?

sshimizu2006 commented 1 year ago

Basically, pwling would be the first choice since it is faster to compute.

EsqYu commented 1 year ago

Okay. Then, What is the advantage of kernel version of it ?

sshimizu2006 commented 1 year ago

kernel is a kind of nonparametric estimator of independence. pwling makes some distributional assumption like super-Gaussian distributions. See the details for these references: https://www.jmlr.org/papers/v14/hyvarinen13a.html for pwling, and https://www.jmlr.org/papers/v3/bach02a.html for kernel.

EsqYu commented 1 year ago

Thanks for your reply and sharing the URLs. I believed I should use the two methods separately depending on the features of the data used for input like skewness or kurtosis. Is this not that kind of thing?

sshimizu2006 commented 1 year ago

Yeah, you can try both of them depending on the nature of the distributions of variables.

EsqYu commented 11 months ago

Thank you. I tried multiscale bootstrapping using the R package, and I got some results, but in the example, they use log-likelihood values as input. Is it appropriate to use bootstrap probabilities as input?". I'm also wondering how many times I should do bootstrap before using this package.