Closed soya-beancurd closed 1 year ago
I guess that no implementation that satisfies both of the points is available even in other packages. A possibility is to modify some implementations based on conditional independence like PC and FCI by changing conditional independence tests depending on the variable types of variables tested (https://link.springer.com/article/10.1007/s41060-018-0097-y). Maybe https://www.jstatsoft.org/article/view/v080i07 might be helpful. But, in your case, it seems treatment and outcome cannot cause the other continuous variables. So, your way of using prior knowledge to analyze the mixed data would be ok.
Your dataset has many rows. HSIC seems to get much slower for larger sample sizes. There might be other faster statistical independence tests (e.g., https://arxiv.org/abs/1804.02747), but I haven't used them.
Well, ICA-LiNGAM does not allow prior knowledge used. This might have made the difference. If you want more sparseness, this issue might be helpful: https://github.com/cdt15/lingam/issues/68
p.s. It would be better to use a different method, e.g., something implemented in dowhy, to compute causal effects from your continuous variables on the binary treatment based on an estimated causal graph rather than using the output of DirectLiNGAM. DirectLiNGAM assumes all the variables are continuous when it computes the causal effects.
Thanks for your reply!
With regards to your first reply,
hsic_test_gamma
from the paper you suggested for independence tests! (probably using the fcit python package from the author of the FCIT paper)
Out of curiosity, are there any plans to incorporate other forms of independence test such as FCIT for the lingam package?gamma = 2
in adaptive lasso) unfortunately does not seem to alter the abovementioned trend for results of DirectLiNGAM With regards to your second reply, I am currently only using the output of DirectLiNGAM as our estimated causal graph.
For instance, the adjacency matrix from DirectLiNGAM is converted to a NetworkX Digraph, before being pushed directly to DoWhy, where it interprets all non-zero entries as an edge and zero entries as no edge. Backdoor criterion in dowhy is then applied to identify confounders within this graph. Therefore, in a way, I am only relying on DirectLiNGAM's adjacency matrix of zero and non-zero values (the magnitude and sign of these values do not matter I guess), and not the causal effects that the LiNGAM class provides.
Were your concerns referring to the fact that the values in the adjacency matrix (not just about zeros and non-zeros) are still employed during independence testing? (i.e., hsic_test_gamma
or even FCIT)
Thanka once again for the speedy and helpful replies!
Ok, then, the direct edges from the continuous variables to the binary treatment might not be properly pruned. DirectLiNGAM uses sparse linear regression to prune directed edges assuming all the variables are continuous. Some other methods like sparse logistic regression having the continuous variables as explanatory variables and the treatment as the response variable would be better to estimate the existence of directed edges from those continuous variables on the binary treatment (and outcome), though DirectLiNGAM can estimate the causal structure of those continuous variables.
Based on your recommendation, would such a scenario below work?
predict_adaptive_lasao
) is used. We do not worry about the contribution/coefficient (β) of T and Y here as βT and βY will be 0 according to the prior matrixlog(Y / 1 - Y) = βX1 + βX2 + βT + ...
Is it therefore safe to assume the β obtained from the sparse logistic regression above can be used as the values for the adjacency matrix?
My suggestion would be something like
Remove all binary variables, apart from the treatment (T) and outcome (Y)
Run DirectLiNGAM on all the continuous variables to get a causal graph of the continuous variables.
2 and 3. Adaptive logistic regression (4.1 of the original adaptive lasso paper: http://users.stat.umn.edu/~zouxx019/Papers/adalasso.pdf) having all the continuous variables as the features and binary treatment as the target. Do the same for the binary outcome. Draw directed edges from the continous variables to the treatment and target based on the sparse patterns of the sparse adaptive logistic regression coefficients.
Hi Dr. Shimizu, the plan that you've suggested seems to be going well so far, and there isn't any downstream issues (e.g., confounder identification and causal estimation) as of now! Thanks so much for your assistance and quick replies!!
I've also tried to replace HSIC with unconditional FCIT (fcit package), which does not seem to have caused any OOM issue thus far!
However, I'd still like to clarify some doubts on your implementation of HSIC:
Thank you!
Hi,
DirectLiNGAM tries to find a DAG that minimizes dependence between error terms. DirectLiNGAM does not use HSIC to prune edges. Rather, HSIC is used to see if the error terms in the estimated DAG are independent. Thiis is to find possible violations of the independence assumption.
Oh I see, thanks for the clarification! Do you then think it's possible to use such independence tests (HSIC or FCIT) to prune edges derived from the adjacency matrix of DirectLiNGAM as described in my question above?
Yeah, that could be an alternative way for pruning edges.
Hello, I’d first like to thank you for this incredible package (along with the interesting papers on LiNGAM you’ve published)!
I’m currently trying to employ this package in my Causal Inference pipeline (causal discovery portion).
More specifically, I am currently using DirectLiNGAM with a prior knowledge matrix (specifically for having an edge from the treatment to outcome variable, and that there should be no other outgoing edges from both the treatment and outcome variables). BottomUpParceLiNGAM would have been the ideal model but it dosent work due to scalability and instant out of memory issues.
After running a couple of experiments with DirectLiNGAM, I have 3 questions I’d like to clarify with you if possible:
get_error_independence_p_values
andbootstrap
? As the former (specifically duringhsic_test_gamma
) causes out of memory issues for even the smallest dataset (e.g., 250k x 155), while the latter takes too long (i.e, the defaultfit
in DirectLiNGAM with the above mentioned prior-knowledge matrix ranges between 20 hours - 5 days for the datasets I currently have)Thank you very much!