bd2kccd / causal-cmd

16 stars 8 forks source link

Unable to get knowledge to work #76

Closed agentzel closed 1 year ago

agentzel commented 1 year ago

Apologies if this is just user error, but I can't get any applications of background knowledge to stick. As a minimal example, I generated data in R using:

data <- data.frame(A = sample(1:30, 1000, replace=TRUE)) data$B <- data$A + sample(c(0,1,2), 1000, replace=TRUE) data$C <- data$A + sample(c(0,1), 1000, replace=TRUE)

write.csv(data, "temp.csv", row.names=FALSE)

So the structure is B <- A -> C, though all edges may not be orientable. I then created a knowledge file, prior.txt, of:

/knowledge

addtemporal 1 A 2 B C

I then run: java -jar causal-cmd-1.3.0-jar-with-dependencies.jar --algorithm fges --dataset temp.csv --delimiter comma --score cg-bic-score --data-type continuous --knowledge prior.txt

However, the resulting edges are Graph Edges:

  1. "A" --- "B"
  2. "C" --- "A"

I've tried adding knowledge as forbiddirect, I've tried different score functions and algorithms, I've tried different dependency structures for the synthetic data...but nothing is changing when I modify the knowledge file. From what I can tell, the format is correct (using the format in https://bd2kccd.github.io/docs/causal-cmd/). Has the format changed?

jdramsey commented 1 year ago

@agentzel Sorry I just saw this. (I don't normally look at issues in causal-cmd, sorry.) This should work; knowledge with FGES is working correctly in the in interface SFAIK and it's using the same API underneath. Are you still having problems with it? I could look further in the coming days...

jdramsey commented 1 year ago

@agentzel Oh wait, I may know the answer to this! Someone else recently reported a bug in knowledge in the GUI where variables with quotation marks around them caused problems with the Knowledge editor there with drag and drop, so I I fixed that. But the issue in your case is that variable names with quotation marks around them are treated as totally different variables than variables without quotation marks around them, so "A" is a different variable name than A for instance. The thing to do, I'm sure of it, is to remove the quotation marks from your knowledge file, and then I'm sure it will work correctly.

Sorry, I was just being dumb. Happens.

jdramsey commented 1 year ago

@agentzel By the way, the GUI issue will be fixed in the upcoming release 7.3.1-2, issue 30 in this list:

https://github.com/cmu-phil/tetrad/wiki/Forthcoming-fixes-for-7.1.3_2

This release should happen very soon. But still, if the data variables don't have quotation marks but the variable in the knowledge file do, it will not recognize that these are the same variables.

If the data variables have quotation marks around them, then one thing you can do is when you read the data in, you can specify " as the quotation character, and then it will read the variable in correctly without quotation marks around them.

jdramsey commented 1 year ago

I believe this is fixed, so I'll close it. If you're still having problems @agentzel let us know.