Request: Show how to run FCI on a DAG with specified latents and with defined knowledge

jdramsey commented 5 months ago

There are many ways to do this; here is a basic way. If this isn't what you're looking for (like if you want to load in the graph and knowledge from a file, say) let me know and I'll revise it.

If you install py-tetrad according to the instructions here:

https://github.com/cmu-phil/py-tetrad

this should work.

import sys

import jpype.imports

BASE_DIR = ".."
sys.path.append(BASE_DIR)
jpype.startJVM(classpath=[f"{BASE_DIR}/pytetrad/resources/tetrad-current.jar"])

import edu.cmu.tetrad.graph as tg
import edu.cmu.tetrad.search as ts
import edu.cmu.tetrad.data as td
import java.util as jutil

nodes = jutil.ArrayList()

for i in range(0, 10):
    nodes.add(tg.GraphNode('x' + str(i)))

dag = tg.EdgeListGraph(nodes)

dag.addDirectedEdge(nodes.get(0), nodes.get(1))
dag.addDirectedEdge(nodes.get(2), nodes.get(5))
dag.addDirectedEdge(nodes.get(1), nodes.get(8))
dag.addDirectedEdge(nodes.get(3), nodes.get(9))
dag.addDirectedEdge(nodes.get(4), nodes.get(6))
dag.addDirectedEdge(nodes.get(5), nodes.get(7))
dag.addDirectedEdge(nodes.get(2), nodes.get(9))
dag.addDirectedEdge(nodes.get(4), nodes.get(7))
dag.addDirectedEdge(nodes.get(2), nodes.get(8))

nodes.get(0).setNodeType(tg.NodeType.LATENT)
nodes.get(2).setNodeType(tg.NodeType.LATENT)
nodes.get(3).setNodeType(tg.NodeType.LATENT)
nodes.get(6).setNodeType(tg.NodeType.LATENT)

print(dag)

kn = td.Knowledge()
kn.addToTier(0, 'x0')
kn.addToTier(0, 'x1')
kn.addToTier(0, 'x2')
kn.addToTier(0, 'x3')
kn.addToTier(0, 'x4')
kn.addToTier(1, 'x5')
kn.addToTier(1, 'x6')
kn.addToTier(1, 'x7')
kn.addToTier(1, 'x8')
kn.addToTier(1, 'x9')

fci = ts.Fci(ts.test.MsepTest(dag))
fci.setKnowledge(kn)
pag = fci.search()

print(pag)

ungvilde commented 5 months ago

Thank you so much, this was very helpful!

But is there any way I can work with the PAG to make plots/compare with original graph?

jdramsey commented 5 months ago

Sure--I can write out some more code for you tomorrow. (Sorry, today got away from me with projects.)

There's a Python class in py-tetrad called TetradSearch.py; you can use the FCI function in that and add the knowledge. If you do that, you can grab the graph back in DOT format and plot it using GraphViz if you have that installed.

To be completely honest, if you want to do this with the least effort, probably using the Java interface would be more straightforward, though it can all be done in Python with no problem :-D

When you say, "compare with the original graph," what do you mean? Did you want to compare a DAG to a PAG? What would the basis be for your comparison?

ungvilde commented 5 months ago

I see! So there is not an adjacency matrix similar to the one I get when I use FCI in causal-learn?

I’m a little new to PAGs, so maybe my thinking is wrong. But I’d like to try to assess how informative the PAG is, in some sense. I’m working with time series graphs with latent variables, and I’d like to see if the PAG can be used to reason causally about the summary graph of the full time series graph. If I can identify confounders then I can design experiments in a more informed way, for example.

On 29 Jan 2024, at 23:08, Joseph Ramsey @.***> wrote:

Sure--I can write out some more code for you tomorrow. (Sorry, today got away from me with projects.)

There's a Python class in py-tetrad called TetradSearch.py; you can use the FCI function in that and add the knowledge. If you do that, you can grab the graph back in DOT format and plot it using GraphViz if you have that installed.

To be completely honest, if you want to do this with the least effort, probably using the Java interface would be more straightforward, though it can all be done in Python with no problem :-D

When you say, "compare with the original graph," what do you mean? Did you want to compare a DAG to a PAG? What would the basis be for your comparison?

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/py-tetrad/issues/18#issuecomment-1915661282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBFUGVS5WLNKZW6MRKB4PDYRAMWHAVCNFSM6AAAAABCOKLK7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJVGY3DCMRYGI. You are receiving this because you commented.

jdramsey commented 5 months ago

There is an adjacency matrix, and the edges don't necessarily correspond to the edges in the original DAG just because some of the variables are latent, so there may be edges that indicate "latent paths" instead of "DAG paths." For instance, if I have the DAG X<-(Y)->Z, where Y is latent, and I run FCI on this, I get an edge Xo-oY, but the adjacency Xo-oY isn't wrong, even though there is not an adjacency X-Y in the original DAG.

ungvilde commented 5 months ago

Hi, sorry to bother you again. But I was wondering, if there is a reason I get different results using causal-learn and py-tetrad, when I apply FCI to the same graph with the same background knowledge?

With Py-Tetrad:

nodes = jutil.ArrayList()

for i in range(1, 7):
    nodes.add(tg.GraphNode('x' + str(i)))

dag = tg.EdgeListGraph(nodes)

dag.addDirectedEdge(nodes.get(0), nodes.get(1))
dag.addDirectedEdge(nodes.get(2), nodes.get(3))
dag.addDirectedEdge(nodes.get(4), nodes.get(5))

dag.addDirectedEdge(nodes.get(2), nodes.get(1))
dag.addDirectedEdge(nodes.get(2), nodes.get(5))

print(dag)

kn = td.Knowledge()
kn.addToTier(0, 'x1')
kn.addToTier(0, 'x3')
kn.addToTier(0, 'x5')

kn.addToTier(1, 'x2')
kn.addToTier(1, 'x4')
kn.addToTier(1, 'x6')

kn.setTierForbiddenWithin(0, True)
kn.setTierForbiddenWithin(1, True)

kn.setRequired('x1','x2')
kn.setRequired('x3','x4')
kn.setRequired('x5','x6')

fci = ts.Fci(ts.test.MsepTest(dag))
fci.setKnowledge(kn)
pag = fci.search()

print(pag)

The printed results is

Graph Edges:
1. x1 --> x2
2. x2 <-> x3
3. x3 --> x4
4. x5 --> x6
5. x6 <-> x3

With causal-learn:

G = nx.DiGraph()
G.add_nodes_from(np.arange(6))
G.add_edges_from([(0, 1), (2, 3), (4, 5), (2, 1), (2, 5)])

pag, _ = fci(np.empty(shape=(10**2, 6)), 
            d_separation, 
            true_dag=G,
            )
nodes = pag.get_nodes()

kn = BackgroundKnowledge()
kn.add_node_to_tier(nodes[0], 0)
kn.add_node_to_tier(nodes[2], 0)
kn.add_node_to_tier(nodes[4], 0)

kn.add_node_to_tier(nodes[1], 1)
kn.add_node_to_tier(nodes[3], 1)
kn.add_node_to_tier(nodes[5], 1)

kn.add_required_by_node(nodes[0], nodes[1])
kn.add_required_by_node(nodes[2], nodes[3])
kn.add_required_by_node(nodes[4], nodes[5])

pag, _ = fci(np.empty(shape=(10**2, 6)), 
            d_separation, 
            true_dag=G,
            background_knowledge=kn,
            )
print(pag)

Here, the printed result is

Graph Edges:
1. X1 --> X2
2. X3 --> X2
3. X3 --> X4
4. X3 --> X6
5. X5 --> X6

jdramsey commented 5 months ago

I guess what you're asking me is to say whether the knowledge here is working correctly. I'll look at the case later. My sense is that it is; knowledge in FCI for Tetrad has been pretty well used and tested. But I'll pull up this example in a bit and have a look. (Sorry, I got busy the last couple of days--)

jdramsey commented 5 months ago

Oh I see--you've said that ind(X2, X6) (because you forbade edges within tier 1) so X2<-oX30->X6 had to be oriented as a collider.

ungvilde commented 5 months ago

I guess I’m a little surprised by this result. I’m trying to incorporate the fact that I know that nodes in the same tier cannot have a causal effect on each other. Is there some way to do this?

I would have imagined that, if I know direct causal interactions cannot occur within a tier, and X3 is observed, then the PAG should not have the edge X2<->X6.

jdramsey commented 5 months ago

Sorry, I've been caught up in projects. The problem with forbidding all edges in the second tier in your example is that you are saying that X2 || X6, i.e., you don't have to condition on X3 to get the independence (i.e., you don't have X2 || X6 | X3), so if FCI is able to orient the collider X2->X3<-X6 it will.

Maybe if you tell me more about what you're trying to accomplish in the bigger picture, I can give better advice..? Like maybe you need a different algorithm maybe?

Again sorry for the radio silence; I've had a couple of all-nighters.

jdramsey commented 5 months ago

Also, notice that FCI's model actually does ensure that there are no edges among the variables in the second tier. I take it that's not what you're looking for, though.

ungvilde commented 5 months ago

No worries, I'm just very appreciative of your help :)

My task is to do causal discovery on time series representing neurons (spike patterns over time) -- I know that (1) neurons have autoregressive effects (so Xi:t -> Xi:t+1 is required) and (2) that neuron i cannot have a causal effect on neuron j within the same time bin t (so Xi:t -> Xj:t is forbidden). Also, of course, effects cannot occur back in time (so Xi:t+1 -> Xj:t is forbidden).

This is essentially the knowledge I want to incorporate, and I think it works (at least it seems to work) when I dont have setTierForbiddenWithin to be True.

Finally, I would like to use interventional data to extract more causal relations between the neurons, but I haven't yet figured out if I can use FCI/Tetrad for this.

jdramsey commented 5 months ago

You may find this paper interesting; we just got this published this year in Neurips:

https://arxiv.org/abs/2310.17679

This is for the BOSS algorithm, another algorithm in Tetrad. As you can see from the paper, it shows promise for fMRI data, and thanks to Bryan (the first author), it has a robust handling of background knowledge. It may help you.

What I don't have a firm grasp on is how to incorporate interventional knowledge. I'll have to think about that.

jdramsey commented 5 months ago

I guess it depends on what kind of interventions you have.

jdramsey commented 5 months ago

One algorithm that is set up to handle interventions already is the GIES algorithm in PCALG in R. You could check that out. I think they have a knowledge facility as well, though it's different from Tetrad's.

jdramsey commented 3 months ago

Sounds like this is OK--if not reopen or make another issue :) Closing for now.

cmu-phil / py-tetrad

Request: Show how to run FCI on a DAG with specified latents and with defined knowledge #18