cmu-phil / tetrad

Repository for the Tetrad Project, www.phil.cmu.edu/tetrad.
GNU General Public License v2.0
404 stars 111 forks source link

What does Tetrad do to discover causal graph with mixed data? #1388

Closed williamty closed 2 years ago

williamty commented 2 years ago

There are many solutions for discovering DAG with mixed data or real world data. What algorithm did Tetrad use?

jdramsey commented 2 years ago

For mixed data in particular? We have two well-tested algorithms that can be used as test or scores for mixed data, Conditional Gaussian and Degenerate Gaussian. Just a second, I'll grab the references for you:

Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian networks of mixed variables. International journal of data science and analytics, 6(1), 3-18.

Andrews, B., Ramsey, J., & Cooper, G. F. (2019, July). Learning high-dimensional directed acyclic graphs with mixed data-types. In The 2019 ACM SIGKDD Workshop on Causal Discovery (pp. 4-21). PMLR.

With these you can use any algorithms that uses a test or a score to infer a causal graph. I'm a little worried about your using the current version of causal-cmd; there were some adjustments in 7.1.0, which is what the Tetrad app uses. Also, with missing values, the datasets load in the Tetrad app (and apparently not in causal-cmd). Is there any chance you could use the Tetrad app, at least for now, to do your analyses?

Also, what is your strategy for dealing with the missing values?

jdramsey commented 2 years ago

In those papers these is a review of the literature on the mixed variable case at the time those methods were published. I corresponded with someone who was implementing conditional Gaussian in R following on the implementation in Tetrad; I have not had a chance yet to test that implementation myself.

williamty commented 2 years ago

Thank you very much for your immediate reply! I'm trying to use R package pcalg instead of causal-cmd now. Tetrad app is graphical interface, so I don't know the specific techs behind, which is a big problem for my analyse.

williamty commented 2 years ago

There's still one question. I noticed PC and FCI can be used for mixed data in Tetrad app. Aren't they constrained based algorithms? If so, these algorithms can't use CG nor DG to preprocess mixed data. Am I wrong?

williamty commented 2 years ago

I used Monte Carlo Markov Chain to imputate the missing values first. But I'm not so sure if this is the best choice. I think maybe Tetrad delete those samples with missing data, or do some regression to imputate missing data, which is similar to MCMC.

jdramsey commented 2 years ago

Oh, BTW, the Tetrad app uses the same Tetrad code as tetrad-cmd, just more up-to-date currently, since it's using 7.1.0. The results should be the same once causal-cmd is updated. I put a considerable effort into getting rid of all known Tetrad bugs in 7.1.0--183 fixes! Took a while. But it should be better, and should be compatible with tetrad-cmd once the latter is updated.

PC and FCI are constraint-based algorithm but conditional gaussian and degenerate Gaussian are also implemented as tests (likelihood ratio tests, that is, using the same likelihood functions as for the corresponding scores). So they should be fine.

I should put in a plug for GRaSP, which was recently accepted to UAI. It's actually more accurate than PC or FGES and can take either a score or a test (you provide both and set which in the parameters). This is in 7.1.0. We're finishing up final edits for the camera-ready paper but once it's all done I'd be happy to send you the paper. There's also a version of FCI using GRaSP in 7.1.0 that hasn't been published yet, but it's more accurate than FCI or even GFCI.

By the way the reason I'm able to respond quickly to messages is that after all these years I've hooked GitHub up into Slack, which I watch incessantly. So when you post an issue in GitHub I get a notification in Slack, and I'm conditioned to pay attention to those.

williamty commented 2 years ago

You are so amazing! THANK YOU!!! I will buy you a drink once you come to Beijing, yes, let me know if you come here. My email is william8620@gmail.com