[Question] What is the difference between probabilities output from result.get_probabilities() and result.get_total_causal_effects()?

cdt15 / lingam

Python package for causal discovery based on LiNGAM.

https://sites.google.com/view/sshimizu06/lingam

MIT License

356 stars 54 forks source link

[Question] What is the difference between probabilities output from result.get_probabilities() and result.get_total_causal_effects()? #117

Open lhl881210 opened 7 months ago

lhl881210 commented 7 months ago

Hi, I am a beginner. I'm not quite sure the difference between probabilities output from result.get_probabilities() and result.get_total_causal_effects() after bootstrap, i.e., result = model.bootstrap(data, n_sampling=1000). I would appreciate it if you could give me more info.

sshimizu2006 commented 7 months ago

result.get_probabilities() gives the bootstrap probabilities of whether direct effects are non-zero (directed edges exist). result.get_total_causal_effects() give the bootstrap probabilities of whether total effects are non-zero (directed paths exist).

lhl881210 commented 7 months ago

Shimizu-Sense Thank you for your answer. I have a follow question. From my humble knowledge, the total_causal_effects of a path needs to be based on a specific DAE before it can be calculated. However, since Bootstrap outputs multiple DAEs, which DAE is total_causal_effects based on? Thanks.

sshimizu2006 commented 7 months ago

Hi, those total effects in the bootstrap outputs are the medians over the bootstrap samples. You can find all the bootstrap results here: https://lingam.readthedocs.io/en/latest/reference/bootstrap.html

lhl881210 commented 6 months ago

Thank you very much for your reply.

I have another question. I have a set of data from a questionnaire, 6 questions, they are discrete variables, collected using the 5 point likert scale. Also I have 3 types of behavioral data, such as time spent on task, they are continuous variables. I want to do the causal discovery for these 6 discrete variables and 3 continuous variables.

I'm wondering if it's appropriate to use DirectLiNGAM for this kind of data.

Because I know that the original LiNGAM as well as ICA-LiNGAM require the data to be continuous variables. But in your Tutorial of DirectLiNGAM, the requirement for continuous variables is removed. https://lingam.readthedocs.io/en/latest/tutorial/lingam.html

Thanks again for your help.

sshimizu2006 commented 6 months ago

if your discrete variables are collected using 5 point likert scale, it would be ok to use DirectLiNGAM thinking they are approximately continuous.

DirectLiNGAM assume variables are continuous. Error variables are continuous. Their liner sums, i.e., observed variables, are also continuous.

lhl881210 commented 6 months ago

Thank you so much for your quick reply!