cdt15 / lingam

Python package for causal discovery based on LiNGAM.
https://sites.google.com/view/sshimizu06/lingam
MIT License
356 stars 54 forks source link

no_paths doesn't work when using prior_knowledge #11

Open kargo113 opened 3 years ago

kargo113 commented 3 years ago

Hi,

I really appreciate this repository because I can apply LINGAM to the system very quickly.

Now, I have one question, "Does no paths work correctly?"

For example of this notebook: https://github.com/cdt15/lingam/blob/master/examples/DirectLiNGAM(PriorKnowledge).ipynb,

generete prior knowledge,

prior_knowledge = make_prior_knowledge(
    n_variables=6,
    exogenous_variables=[0],
    no_paths=[[2,1]])
print(prior_knowledge)
make_prior_knowledge_graph(prior_knowledge)

outout data is

[[ 0  0  0  0  0  0]
 [-1  0  0 -1 -1 -1]
 [-1 -1  0 -1 -1 -1]
 [-1 -1 -1  0 -1 -1]
 [-1 -1 -1 -1  0 -1]
 [-1 -1 -1 -1 -1  0]]

It seems the path "2 -> 1" is zero.

However, if the data fit model

model = lingam.DirectLiNGAM(prior_knowledge=prior_knowledge)
model.fit(X)

output model.adjacencymatrix is


    0   1   2   3   4   5
0   0.000000    0.0 0.000000    0.000000    0.0 0.0
1   2.986726    0.0 2.006062    0.000000    0.0 0.0
2   0.000000    0.0 0.000000    6.016333    0.0 0.0
3   0.299046    0.0 0.000000    0.000000    0.0 0.0
4   7.984485    0.0 -0.990590   0.000000    0.0 0.0
5   3.952478    0.0 0.000000    0.000000    0.0 0.0

The path "2 -> 1" has value 2.006062.

Is it the correct output value? When using "no paths", the value should be zero, just I think.

sshimizu2006 commented 3 years ago

Hi, this prior knowledge option does not necessarily force the estimated graph to satisfy the prior knowledge given by users. DirectLiNGAM algorithm implemented in this library estimates the causal orders of variables one by one. Therefore, for example, if esimation of the causal orders of some variables fails before the causal orders of the variables about which prior knowledge is available are estimated, then sometimes the prior knowledge cannot be used or sometimes the algorithm might have to estimate some causal orders that might be wrong. Prior knowledge about exogenous variables and sink variables are more likely to be reflected to the output in DirectLiNGAM. Though this "soft" way of using prior knowledge might be different from what some users expect, we thought this option is still helpful to make the estimation better.

kargo113 commented 3 years ago

Thank you for your answer. I understand the soft way "no_paths" doesn't force the value to zero.

In addition, is there a better solution that some path values will be zero or quite small value?

Because, some value should be zero when LINGAM was applied to a business problem. In other words, even though the causal effect between variable A and variable B obviously does not exist, LINGAN sometimes estimates the value is not zero but high.

sshimizu2006 commented 3 years ago

1) DirectLiNGAM roughly consists of two steps. First, it estimates causal orders of variables. Second, it estimates the coefficients. If the estimated causal orders are acceptable, putting the coefficient from A to B to be zero and estimating the other coefficients based on the estimated causal orders might be a compromise. This can be done using traditional path analysis or structural equation modeling code package.

2) Another way might be to compute the bootstrapp probability of the directed edge from A to B. The bootstrap probability might be not quite large.

kargo113 commented 3 years ago

Thanks for your suggestion. I understand these solutions.

Thank you so much.

sshimizu2006 commented 3 years ago

Now in v1.5.2, you can FORCE prior knowledge on causal ORDERS into estimation, e.g., x1 cannot cause x2.

kargo113 commented 3 years ago

Thank you for applying "FORCE prior knowledge". I'll try to use this method when analyzing our data.