cmu-phil / tetrad

Repository for the Tetrad Project, www.phil.cmu.edu/tetrad.
GNU General Public License v2.0
404 stars 111 forks source link

"Run Search & Generate Graph" button is clicked, but no reaction is produced. #1283

Closed gl97at closed 3 years ago

gl97at commented 4 years ago

Hi there,

I loaded a mixed dataset consisting of continuous variables, categorical variables with max 3 different categories per variable. Some values, in both types of variables, are missing. Thus, they are represented by an *. No error was found, while loading the dataset.

Then, I tried to construct the causal graph using the "Search" button. I selected an algorithm and values for its parameters, I pressed the "Run Search & Generate Graph" button, but there was no reaction. I tried any of the available algorithms, adjusting the parameters for each of them, but again, even if I pressed the button, there was no reaction. Any idea?

PS: The button can be clicked, but no reaction is produced.

Thank you very much in advance!

kvb2univpitt commented 4 years ago

Hello. What version of Tetrad and Java are you using?

gl97at commented 4 years ago

Hello, Thank you very much for your prompt answer! I am using Tetrad 6.7.1 version and Java 12 version.

kvb2univpitt commented 4 years ago

Tetrad 6.7.1 is built and tested on Java 8. Please use Java 8. We do notice that Tetrad 6.7.1 is unstable for Java 9+. We are planing on supporting new version of Java for future releases. I will leave this issue open in case it's not a Java version problem.

gl97at commented 4 years ago

Thank you very much for your help! I will use Java 8, as you suggested, and I hope that it will work!

gl97at commented 4 years ago

Hi there, I started using Tetrad 6.7.1 and Java8 for the previously described mixed dataset. Again, I loaded with no error that dataset and then, I tried to construct the causal graph using the "Search" button. After selecting an algorithm and values for its parameters, I pressed the "Run Search & Generate Graph" button, but again there was no reaction. Instead, when closing the "Search" button's window, I get the following error:

Capture

I am trying to figure out why I get the previous error message and how I could ensure that cutoffs are in "nondecreasing order".

I found out that line 334 in https://github.com/benoslab/tetradLite/blob/master/src/edu/cmu/tetrad/data/Discretizer.java mentions that error. Is that something useful?

I am looking forward to your answer. Thank you in advance.

jdramsey commented 4 years ago

I think this happens when you have cutoffs like [0 0 .25 .7 1]. Maybe the thing is to reduce the number of categories? Guessing...I don't have you data...

gl97at commented 4 years ago

Thank you very much for your answer.

Well, my mixed dataset consists of categorical variables with max 3 different categories per variable and continuous variables. (Some values, in both types of variables, are missing. Thus, they are represented by an *.)

Would you be very kind as to explain to me, what these cutoffs are and how they are used?

Should I reduce that number (3) of different categories per variable? But, they are just 3 different categories... Is it going to help?

I am looking forward to your answer!

jdramsey commented 4 years ago

Oh, wait, are there any missing values in the columns you're discretizing?

On Sat, May 16, 2020 at 2:46 AM gl97at notifications@github.com wrote:

Thank you very much for your answer.

Well, my mixed dataset consists of categorical variables with max 3 different categories per variable and continuous variables. (Some values, in both types of variables, are missing. Thus, they are represented by an *.)

Would you be very kind as to explain to me, what these cutoffs are and how they are used?

Should I reduce that number (3) of different categories per variable? But, they are just 3 different categories... Is it going to help?

I am looking forward to your answer!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1283#issuecomment-629598318, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSRZOPEF7AASB7PDTV5TRRYZFVANCNFSM4MY5X6PQ .

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213

jsph.ramsey@gmail.com Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

gl97at commented 4 years ago

Well, yes! Both continuous and categorical variables contain missing values!

Just to make it clear, I load that dataset, selecting "mixed" as a data-type and then, I construct a causal graph. I do not discretize any of the variables. Should I do that?

jdramsey commented 4 years ago

Could you impute the missing values with their column mode and try again?

gl97at commented 4 years ago

Thank you very much for your prompt answer! I have to admit that I tried what you suggested and it really works! Thank you very much! But, unfortunately, given the nature of that data, I think that it does not make sense to impute missing values. It's kind of bias... Is there any other possible solution you could think of?

jdramsey commented 4 years ago

Hold on--why do you think not imputing the value and letting the algorithms deal with it however it wants does not introduce a bias?

gl97at commented 4 years ago

To be honest, that is something that I did not realize before... I had not realized that letting algorithms deal with missing values however they want is kind of bias, too! So, I guess that imputing missing values is useful and necessary...

As I see, there are many ways to impute missing values. Should I choose the column's mode one or it was just a suggestion?

PS: I thought that Tetrad could accept missing values in categorical and continuous data.

jdramsey commented 4 years ago

Truth be told, if you're doing a serious analysis for publication, you should always impute your own values. There are lots of methods for imputing values--look in R for instance to see what you can find--you should pick the one that suits your data assumptions the best.

There are some "correct" ways to deal with missing data. For instance, FCI is correct if you do test-wise independence testing. I've not had a chance to implement that but should. Maybe I'll do it. There are some other ways as well. There's a way of running PC but keeping track of the missing value model. Again, I haven't implemented this. Short of methods like this, the best thing to do is a good imputation of your missing values.

Another problem is that the mixed likelihood scores we're using do not handle missing values, as you've seen. I'm not sure I know how to do that. All well and good, but if we're not going ot handle them, we should scan the data before analyzing it and tell you that.

gl97at commented 4 years ago

Well, I will search to find methods for imputing missing values, even among Tetrad's available methods, and I will decide which of them suits my own data. Thank you very much for your help!

jdramsey commented 4 years ago

You are welcome!

SamuelKingwill commented 4 years ago

Hello,

I am having a similar problem, have read through the above thread, but am still stuck. I hope that you could help.

I am using Java 8, and tetrad 6.7.1. I have both continuous and binary variables, so I loaded the data using the "mixed discrete and continuous" type. I have imputed missing values appropriately according to the specific features.

I am looking to use the FGES-MB algorithm to find the markov blanket of DAGs, and thereafter speak to domain experts to clarify uncertain relationships.

When I click "Run Search and generate graph", the button goes grey, and a dialogue box pops up for a second saying "Executing", but then nothing happens. When I click "Done", it states I have not performed a search. I'm not sure where to go from here. I have attached a picture of my settings. I am not sure if I need to set the structure prior coefficient.

Any hep would be greatly appreciated.

Capture

jdramsey commented 4 years ago

Hmm. How many variables? Do you think it's a dense graph? Maybe try setting "Yes if continuous variables should be discretized when child is discrete" to Yes.

jdramsey commented 4 years ago

Or the degenerate Gaussian score...?

SamuelKingwill commented 4 years ago

Thank you for the swift reply.

It is definitely a dense graph. I have a Target, a Treatment with four categories which I have one hot encoded, and 84 features. There are 300 000 instances, could this all be too much?

I tried with the setting set to yes and nothing changed. As I selected mixed for the data type the only scoring available is the Conditional Gaussian BIC. Should I be looking somewhere else for the degenerate Gaussian score you mentioned?

If I am using FGES-MB do I perhaps need to check whether my features have linear relationships with each other?

The end goal is to use the graph to find the correct adjustment set so that I can model the counterfactual for each instance given the other treatments. Is there perhaps a better way to do this?

jdramsey commented 4 years ago

I guess we haven't published a new version of Tetrad for a while. Try grabbing the development version with this link ("Jeremy's Magic Link"):

https://cloud.ccd.pitt.edu/latest-dev-tetrad.html

I'm pretty sure that should have the degenerate Gaussian in it.

On Sat, Aug 29, 2020 at 9:19 AM SamuelKingwill notifications@github.com wrote:

Thank you for the swift reply.

It is definitely a dense graph. I have a Target, a Treatment with four categories which I have one hot encoded, and 84 features. There are 300 000 instances, could this all be too much?

I tried with the setting set to yes and nothing changed. As I selected mixed for the data type the only scoring available is the Conditional Gaussian BIC. Should I be looking somewhere else for the degenerate Gaussian score you mentioned?

If I am using FGES-MB do I perhaps need to check whether my features have linear relationships with each other?

The end goal is to use the graph to find the correct adjustment set so that I can model the counterfactual for each instance given the other treatments. Is there perhaps a better way to do this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1283#issuecomment-683289170, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR7VAFKKLDDLFTT6LSLSDD56HANCNFSM4MY5X6PQ .

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213

jsph.ramsey@gmail.com Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

SamuelKingwill commented 4 years ago

Thank you for Jeremy's Magic Link. Much appreciated.

I tried with Degenerative Gaussian but still no result. I simply have the data feeding into a search block as below. From what I can tell from the manual this should be fine?

image

jdramsey commented 4 years ago

Try setting the maximum degree to 2 to see if you get a graph. I have a feeling you're in a combinatorial marsh.

On Sat, Aug 29, 2020 at 12:01 PM SamuelKingwill notifications@github.com wrote:

Thank you for Jeremy's Magic Link. Much appreciated.

I tried with Degenerative Gaussian but still no result. I simply have the data feeding into a search block as below. From what I can tell from the manual this should be fine?

[image: image] https://user-images.githubusercontent.com/58136905/91640969-a9b12280-ea21-11ea-869d-2eaa3d2e1ebe.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1283#issuecomment-683309180, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSRYREN4I5XSBJFIDEQ3SDEQ7HANCNFSM4MY5X6PQ .

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213

jsph.ramsey@gmail.com Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

SamuelKingwill commented 4 years ago

Still nothing. I tried on a smaller data set with only 9 features and it generated a graph. Is there anything you would suggest?

jdramsey commented 4 years ago

Hm.......maybe we need to look under the hood...can you send me a sample dataset or is it private?

On Sat, Aug 29, 2020 at 2:37 PM SamuelKingwill notifications@github.com wrote:

Still nothing. I tried on a smaller data set with only 9 features and it generated a graph. Is there anything you would suggest?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1283#issuecomment-683327492, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR3LIFI65PK3RTKG7HTSDFDGXANCNFSM4MY5X6PQ .

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213

jsph.ramsey@gmail.com Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

SamuelKingwill commented 4 years ago

It is unfortunately confidential. Let me speak to my supervisor to see if I can make a plan. Thank you so much for all your help. I will be in touch.

cg09 commented 4 years ago

Hello,

I am late to this conversation. I don't know what search algorithm you are running, but I suggest when there may be a lot of variables and dense graphs that FGES be tried with a very high penalty, like 40, or PC with a very very small rejection level, like 10-6. If that returns results, the penalty can be systematically lowered until the program hangs. That all takes some time.

Clark Glymour

On Sat, Aug 29, 2020 at 2:37 PM SamuelKingwill notifications@github.com wrote:

Still nothing. I tried on a smaller data set with only 9 features and it generated a graph. Is there anything you would suggest?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmu-phil/tetrad/issues/1283#issuecomment-683327492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4Y3ONQBUUSAIXKQ66S57TSDFDG3ANCNFSM4MY5X6PQ .

SamuelKingwill commented 4 years ago

Hello Professor Glymour, thank you for the input.

Apologies for the delays, I am in South Africa and the time zones don't align very well.

I tried what you have suggested above but to no avail. I am still waiting on confirmation from my supervisor but will get back to you as soon as I can.