Can tetrad generate directed matrix? - Githubissues

cmu-phil / tetrad

Repository for the Tetrad Project, www.phil.cmu.edu/tetrad.

GNU General Public License v2.0

408 stars 112 forks source link

Can tetrad generate directed matrix? #1621

Closed magical-hhc closed 1 year ago

magical-hhc commented 1 year ago

I have a set of data. After using the PC algorithm to get the directed acyclic graph, how can I get the directed matrix that reflects the causal relationship between variables based on the causality test? Normally, I can use 0,1 myself to indicate the existence or non-existence of a directed edge, but if I need more information, such as considering the weight, distance or other attributes of the directed edge, then the value in the directed matrix will not only be 0 or 1. I cannot complete this task. I don't know what good advice you have (PS: adjust the value in the directed matrix according to the causality under different significance levels, I don't know whether this is a good idea)

jdramsey commented 1 year ago

@magical-hhc There are some issues here.

PC is a CPDAG algorithm, meaning that not all edges will be directed edges. That is, the output represents an equivalence class of directed acyclic graphs (DAGs); you can choose a DAG from this equivalence class in various ways by orienting one of the directed edges, seeing what other orientations are implied and orienting those, and continuing in this way until you have a DAG. When you say "directed matrix," you mean a matrix representing a DAG--or do I mistake you?
Another issue is that due to unfaithfulness, PC will often not produce a perfect CPDAG. Algorithms that always produce perfect CPDAGs are FGES and GRaSP. You may want to look at those.
Even if you have a DAG, as you say, you may not have a parameterized DAG, as you say. Some algorithms like LiNGAM and related linear, on-Gaussian algorithms give you parameterized weight matrices; however, for those, you typically need to be in a situation where you can assume linearity and non-Gaussianity. If you have a DAG and a dataset, though, and your data is linear, you can easily estimate the parameters. And you're right; it would be super-slick if the estimator could output the weight matrix for you for use in other software. I'll think about that.

By the way, I just reorganized the search package in Tetrad (did I tell you this? Maybe..) and gave documentation for all of the algorithms, with paper references--here it is:

https://www.phil.cmu.edu/tetrad-javadocs/temp/edu/cmu/tetrad/search/package-summary.html

This still needs to be put in the published Tetrad version, but it should soon be. It would be helpful if you were thinking about which algorithm to use for your project.

jdramsey commented 1 year ago

@magical-hhc I updated that last comment a little; Grammarly made a mistake. PC sometimes will not produce a perfect CPDAG; sometimes, it may contain bidirected edges, so you can't make an adjacency matrix from it in the usual way. Also, I was wondering if there wasn't something you could already use in Tetrad for saving graphs for use in Python or R. There are several formats for saving graphs; I'm curious if one of them works for you. Here's a picture:

Screenshot 2023-05-19 at 12 29 39 PM

Unfortunately, everyone seems to have their favorite method for saving graphs, so there might be better options.

Regarding saving weighted graphs, there's nothing in the interface to save those out as matrices, but there should be, in a couple of places, for instance, in the SEM Estimator. I'll think about adding that.

jdramsey commented 1 year ago

Well, there's one place you could get them. I just reprogrammed the LiNGAM and LiNG-D algorithms from scratch, and the weighted B Hat matrix (or matrices) is printed to the console. Those algorithms require that you have linear, non-Gaussian data, but if you do, they're working pretty well. (If you have more than one Gaussian variable in your dataset, they will produce nonsense.) The way to get that weighted matrix would be to launch the Tetrad interface using the java -jar option, then run LiNGAM on your data. The BHat matrix or matrices will be printed in the Terminal window. It could also be printed in the log window if you turn on logging in the interface; I can't remember off the top of my head.

magical-hhc commented 1 year ago

Ok, I will try it with reference to your suggestion.

jdramsey commented 1 year ago

Nice. By the way can you leave this issue open so I don't forget to work out the saving of parameterized matrices issue?

magical-hhc commented 1 year ago

of course！

jdramsey commented 1 year ago

Let me take this one up. OK, I aim to save the B matrix for a DAG estimated using the SEM estimator. Of course we need to save the exogenous variances as well. Maybe this format?

Variables:

a d f s

Exogenous variances:

a b c d e...

B Hat:

a 0 0. 0 ... b d. 0 f ... 0 e ...

OK, in the development branch I've added menu items to SEM IM Editor and SEM Estimator Editor to copy the coefficient (B Hat) matrix and the error covariance matrix to the clipboard. You can then paste them in a text file.

That's the fix I was intending to do for this, and I was the one requesting the issue to stay open, so I'll close it but @magical-hhc open it back up if it wasn't what you intended.

This is in the development branch now and will be published in the next publish cycle.