bd2kccd / causal-cmd

16 stars 8 forks source link

Issue with algorithm fges-mb #74

Open Zarmas opened 2 years ago

Zarmas commented 2 years ago

Hello,

I tried using the fges-mb algorithm, in an attempt to work around the issue I had when running the fges algorithm. The system I use has 14 threads and 128gb RAM. The data I enter are approximately 20000 continuous variables and 3000 samples. One of the commands I used was the following:

java -Xmx128G -jar ~/bin/causal-cmd/causal-cmd-1.4.1-SNAPSHOT-jar-with-dependencies.jar --data-type continuous --default --delimiter tab --json-graph --algorithm fges-mb --dataset dataset.txt --target targetname --score sem-bic-score --penaltyDiscount 10.0 --maxDegree 100

As with fges, the largest data size it could handle was 2000 variables with 100 samples, printing this additional message: "heuristicSpeedup = false". The --parallelized option was not available for this algorithm, but it could run parallel when using the --default switch. The issue with that was that I could not change the other options for the algorithm, even if I specified them in the command (e.g. in the log it is "verbose: yes" even if I specify "--verbose no" in the command). Is there a way to get parallel execution while being able to choose the other algorithm options?

After trying to run the fges-mb algorithm with a larger data size I got errors. When used a 2000 variable-2000sample dataset, I got "Exception in thread "main" java.util.ConcurrentModificationException". When I used an even larger dataset(i.e. 10000variables-3000samples) I got the following error line: Exception in thread "main" java.lang.NullPointerException. After displaying the errors, the executable exits.

I attach the full error messages to this ticket. I tried running the fges-mb algorithm with multiple different variables as target. Is there a way to get around these errors?

One other issue I had was when I was using the fges-mb algorithm on the whole dataset, it took too long to even display the message, approximately one hour, meaning it takes too long and repeating it for every variable will take years.

All the tests were done on the same 14 thread and 128gb RAM system, but the errors on the fges-mb algorithm were also presented on a different system with less ram and fewer threads. On the paper accompanying the caucal-cmd executable(Ramsey Et al. 2017), it is stated that it can be used for a million variables and more, but it seems impossible to use it on more than 2000 variables, on an above average system.

Thank you in advance, George ConcurrentModificationException.txt NullPointerException.txt

jdramsey commented 2 years ago

Hmmm... the paper you cite was for an extremely sparse model with average degree 2; for denser problems it will be slower. But can I ask right away which version of causal-cmd you are using? The latest one is in Maven Central, here:

https://s01.oss.sonatype.org/content/repositories/releases/io/github/cmu-phil/causal-cmd/1.4.1/

On Mon, Aug 1, 2022 at 9:20 AM gZarmalis @.***> wrote:

Hello,

I tried using the fges-mb algorithm, in an attempt to work around the issue I had when running the fges algorithm. The system I use has 14 threads and 128gb RAM. The data I enter are approximately 20000 continuous variables and 3000 samples. One of the commands I used was the following:

java -Xmx128G -jar ~/bin/causal-cmd/causal-cmd-1.4.1-SNAPSHOT-jar-with-dependencies.jar --data-type continuous --default --delimiter tab --json-graph --algorithm fges-mb --dataset dataset.txt --target targetname --score sem-bic-score --penaltyDiscount 10.0 --maxDegree 100

As with fges, the largest data size it could handle was 2000 variables with 100 samples, printing this additional message: "heuristicSpeedup = false". The --parallelized option was not available for this algorithm, but it could run parallel when using the --default switch. The issue with that was that I could not change the other options for the algorithm, even if I specified them in the command (e.g. in the log it is "verbose: yes" even if I specify "--verbose no" in the command). Is there a way to get parallel execution while being able to choose the other algorithm options?

After trying to run the fges-mb algorithm with a larger data size I got errors. When used a 2000 variable-2000sample dataset, I got "Exception in thread "main" java.util.ConcurrentModificationException". When I used an even larger dataset(i.e. 10000variables-3000samples) I got the following error line: Exception in thread "main" java.lang.NullPointerException. After displaying the errors, the executable exits.

I attach the full error messages to this ticket. I tried running the fges-mb algorithm with multiple different variables as target. Is there a way to get around these errors?

One other issue I had was when I was using the fges-mb algorithm on the whole dataset, it took too long to even display the message, approximately one hour, meaning it takes too long and repeating it for every variable will take years.

All the tests were done on the same 14 thread and 128gb RAM system, but the errors on the fges-mb algorithm were also presented on a different system with less ram and fewer threads. On the paper accompanying the caucal-cmd executable(Ramsey Et al. 2017), it is stated that it can be used for a million variables and more, but it seems impossible to use it on more than 2000 variables, on an above average system.

Thank you in advance, George ConcurrentModificationException.txt https://github.com/bd2kccd/causal-cmd/files/9234203/ConcurrentModificationException.txt NullPointerException.txt https://github.com/bd2kccd/causal-cmd/files/9234204/NullPointerException.txt

— Reply to this email directly, view it on GitHub https://github.com/bd2kccd/causal-cmd/issues/74, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSRYQ6MQMJUYTDW4CGH3VW7FLNANCNFSM55HRKRFA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jdramsey commented 2 years ago

Also, have you increased the heap size for Java by including a flag like -Xmx10g for e.g. 10 gigs?

On Mon, Aug 1, 2022 at 12:35 PM Joseph Ramsey @.***> wrote:

Hmmm... the paper you cite was for an extremely sparse model with average degree 2; for denser problems it will be slower. But can I ask right away which version of causal-cmd you are using? The latest one is in Maven Central, here:

https://s01.oss.sonatype.org/content/repositories/releases/io/github/cmu-phil/causal-cmd/1.4.1/

On Mon, Aug 1, 2022 at 9:20 AM gZarmalis @.***> wrote:

Hello,

I tried using the fges-mb algorithm, in an attempt to work around the issue I had when running the fges algorithm. The system I use has 14 threads and 128gb RAM. The data I enter are approximately 20000 continuous variables and 3000 samples. One of the commands I used was the following:

java -Xmx128G -jar ~/bin/causal-cmd/causal-cmd-1.4.1-SNAPSHOT-jar-with-dependencies.jar --data-type continuous --default --delimiter tab --json-graph --algorithm fges-mb --dataset dataset.txt --target targetname --score sem-bic-score --penaltyDiscount 10.0 --maxDegree 100

As with fges, the largest data size it could handle was 2000 variables with 100 samples, printing this additional message: "heuristicSpeedup = false". The --parallelized option was not available for this algorithm, but it could run parallel when using the --default switch. The issue with that was that I could not change the other options for the algorithm, even if I specified them in the command (e.g. in the log it is "verbose: yes" even if I specify "--verbose no" in the command). Is there a way to get parallel execution while being able to choose the other algorithm options?

After trying to run the fges-mb algorithm with a larger data size I got errors. When used a 2000 variable-2000sample dataset, I got "Exception in thread "main" java.util.ConcurrentModificationException". When I used an even larger dataset(i.e. 10000variables-3000samples) I got the following error line: Exception in thread "main" java.lang.NullPointerException. After displaying the errors, the executable exits.

I attach the full error messages to this ticket. I tried running the fges-mb algorithm with multiple different variables as target. Is there a way to get around these errors?

One other issue I had was when I was using the fges-mb algorithm on the whole dataset, it took too long to even display the message, approximately one hour, meaning it takes too long and repeating it for every variable will take years.

All the tests were done on the same 14 thread and 128gb RAM system, but the errors on the fges-mb algorithm were also presented on a different system with less ram and fewer threads. On the paper accompanying the caucal-cmd executable(Ramsey Et al. 2017), it is stated that it can be used for a million variables and more, but it seems impossible to use it on more than 2000 variables, on an above average system.

Thank you in advance, George ConcurrentModificationException.txt https://github.com/bd2kccd/causal-cmd/files/9234203/ConcurrentModificationException.txt NullPointerException.txt https://github.com/bd2kccd/causal-cmd/files/9234204/NullPointerException.txt

— Reply to this email directly, view it on GitHub https://github.com/bd2kccd/causal-cmd/issues/74, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSRYQ6MQMJUYTDW4CGH3VW7FLNANCNFSM55HRKRFA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Zarmas commented 2 years ago

Yes, I am using the latest version and I have increased the heap size. You can see the entire command I use on the first message of the post.

jdramsey commented 2 years ago

Hmmmm.... 20,000 variables should not be a problem, unless the data loader can't load it (@kvb2univpitt ?). The execution time may be due to the density of the graph. Let me try a 20,000 variable problem with N = 3000 in the Tetrad GUI...

jdramsey commented 2 years ago

Wait, is this continuous or discrete or mixed?

Zarmas commented 2 years ago

Continuous.

jdramsey commented 2 years ago

Oh wow, this is breaking the GUI. Hmm.. let me think how I did that before for the million variable problem. I was using different data structures at the time, a different simulator ('large scale simulator", which we still have), and a different covariance matrix class (which calculated covariances on the fly instead of storing them in a covariance matrix). The data loading is @kvb2univpitt's, so hopefully he can comment on that. I suppose I could give you a version that uses the covariance matrix on the fly. For 3000 samples it might be a bit slow but it should finish for you, depending on the density of your true model.

jdramsey commented 2 years ago

I'm attending UAI virtually in the Netherlands since 3 AM this morning so am a bit sleep deprived, but if I can catch some sleep maybe I could work on this for you--that is, the parts that I can.

jdramsey commented 2 years ago

Wait, you should be able to tell if the data loading is successful. Is it?

Zarmas commented 2 years ago

This is the last message on the .txt: Start search: Tue, August 02, 2022 02:28:44 PM

kvb2univpitt commented 2 years ago

Sorry for joining in the conversation late. If the search has started, that means the data has already been read in. So, it seems like the data loading part is fine. Based on the attached error files, it seems to break when running FGES. The reason why you're getting ConcurrentModificationException is because the data structure, HashMap in this case, is being modified by one thread and being read by another thread. @jdramsey It seems like parallelization might not be working correctly.

jdramsey commented 1 year ago

@Zarmas Sorry just coming back to looking at these causal-cmd issues now. I will try this again tomorrow.

jdramsey commented 1 year ago

@Zarmas Actually we were having some concurrency problems with FGES in another project and I think I just fixed them there. I'm thinking I just fixed them for your application as well. I looked at your error messages and think I've addressed those problem. (Due to concurrency, some nodes were being removed and then "removed" again, causing issues.)

We're going to do a new release soon of Tetrad and causal-cmd; hopefully these problems will go away.

Zarmas commented 1 year ago

@jdramsey Hello again, I am still getting the "ConcurrentModificationException" error message when using the fges-mb algorithm. The full error message is shown in the original message of this post. I have also encountered a problem when using the fges algorithm and posted it on a closed relative to fges issue, but can't re-open it since I wasn't the one that closed it.

jdramsey commented 1 year ago

Huh, this method that's throwing the exception is in line 408 in the current version 7.2.2 code, not line 647 as in the exception message--do you know which version you're using currently?

public boolean isAdjacentTo(Node node1, Node node2) { if (node1 == null || node2 == null || this.edgeLists.get(node1) == null || this.edgeLists.get(node2) == null) { return false; }

// Trying to fix a concurency problem.
for (Edge edge : this.edgeLists.get(node1)) {
    if (Edges.traverse(node1, edge) == node2 &&

Edges.traverse(node2, edge) == node1) { return true; } }

return false;

}

On Wed, Mar 1, 2023 at 2:03 PM gZarmalis @.***> wrote:

@jdramsey https://github.com/jdramsey Hello again, I am still getting the "ConcurrentModificationException" error message when using the fges-mb algorithm. The full error message is shown in the original message of this post. I have also encountered a problem when using the fges algorithm and posted it on a closed relative to fges issue, but can't re-open it since I wasn't the one that closed it.

— Reply to this email directly, view it on GitHub https://github.com/bd2kccd/causal-cmd/issues/74#issuecomment-1450703532, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR72V2Z7BF7LN2MIYXDWZ6MOJANCNFSM55HRKRFA . You are receiving this because you were mentioned.Message ID: @.***>

jdramsey commented 1 year ago

Let me try to find the original posting...

On Wed, Mar 1, 2023 at 2:31 PM Joseph Ramsey @.***> wrote:

Huh, this method that's throwing the exception is in line 408 in the current version 7.2.2 code, not line 647 as in the exception message--do you know which version you're using currently?

public boolean isAdjacentTo(Node node1, Node node2) { if (node1 == null || node2 == null || this.edgeLists.get(node1) == null || this.edgeLists.get(node2) == null) { return false; }

// Trying to fix a concurency problem.
for (Edge edge : this.edgeLists.get(node1)) {
    if (Edges.traverse(node1, edge) == node2 && Edges.traverse(node2, edge) == node1) {
        return true;
    }
}

return false;

}

On Wed, Mar 1, 2023 at 2:03 PM gZarmalis @.***> wrote:

@jdramsey https://github.com/jdramsey Hello again, I am still getting the "ConcurrentModificationException" error message when using the fges-mb algorithm. The full error message is shown in the original message of this post. I have also encountered a problem when using the fges algorithm and posted it on a closed relative to fges issue, but can't re-open it since I wasn't the one that closed it.

— Reply to this email directly, view it on GitHub https://github.com/bd2kccd/causal-cmd/issues/74#issuecomment-1450703532, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR72V2Z7BF7LN2MIYXDWZ6MOJANCNFSM55HRKRFA . You are receiving this because you were mentioned.Message ID: @.***>

jdramsey commented 1 year ago

Oh, you're using causal-cmd 1.4.1. I wonder what would happen if you switched to version 1.6.1 of causal-cmd, which uses version 7.2.2 of Tetrad? It looks like I made an effort to fix that problem, but I don't know whether I was successful.

On Wed, Mar 1, 2023 at 2:32 PM Joseph Ramsey @.***> wrote:

Let me try to find the original posting...

On Wed, Mar 1, 2023 at 2:31 PM Joseph Ramsey @.***> wrote:

Huh, this method that's throwing the exception is in line 408 in the current version 7.2.2 code, not line 647 as in the exception message--do you know which version you're using currently?

public boolean isAdjacentTo(Node node1, Node node2) { if (node1 == null || node2 == null || this.edgeLists.get(node1) == null || this.edgeLists.get(node2) == null) { return false; }

// Trying to fix a concurency problem.
for (Edge edge : this.edgeLists.get(node1)) {
    if (Edges.traverse(node1, edge) == node2 && Edges.traverse(node2, edge) == node1) {
        return true;
    }
}

return false;

}

On Wed, Mar 1, 2023 at 2:03 PM gZarmalis @.***> wrote:

@jdramsey https://github.com/jdramsey Hello again, I am still getting the "ConcurrentModificationException" error message when using the fges-mb algorithm. The full error message is shown in the original message of this post. I have also encountered a problem when using the fges algorithm and posted it on a closed relative to fges issue, but can't re-open it since I wasn't the one that closed it.

— Reply to this email directly, view it on GitHub https://github.com/bd2kccd/causal-cmd/issues/74#issuecomment-1450703532, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR72V2Z7BF7LN2MIYXDWZ6MOJANCNFSM55HRKRFA . You are receiving this because you were mentioned.Message ID: @.***>

Zarmas commented 1 year ago

I am using the latest version, the original error was in a previous one, but I am still getting it in the latest one. This is the new error message:

Exception in thread "main" java.util.ConcurrentModificationException at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584) at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607) at edu.cmu.tetrad.graph.EdgeListGraph.isAdjacentTo(EdgeListGraph.java:414) at edu.cmu.tetrad.search.FgesMb.calculateArrowsForward(FgesMb.java:1021) at edu.cmu.tetrad.search.FgesMb.access$600(FgesMb.java:57) at edu.cmu.tetrad.search.FgesMb$1AdjTask.compute(FgesMb.java:995) at edu.cmu.tetrad.search.FgesMb$1AdjTask.compute(FgesMb.java:933) at java.base/java.util.concurrent.RecursiveTask.exec(RecursiveTask.java:95) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinTask.tryExternalHelp(ForkJoinTask.java:381) at java.base/java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:323) at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:398) at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:721) at java.base/java.util.concurrent.ForkJoinPool.invoke(ForkJoinPool.java:2432) at edu.cmu.tetrad.search.FgesMb.reevaluateForward(FgesMb.java:1016) at edu.cmu.tetrad.search.FgesMb.fes(FgesMb.java:810) at edu.cmu.tetrad.search.FgesMb.search(FgesMb.java:233) at edu.cmu.tetrad.algcomparison.algorithm.oracle.cpdag.FgesMb.search(FgesMb.java:70) at edu.pitt.dbmi.causal.cmd.tetrad.TetradRunner.lambda$runSearch$3(TetradRunner.java:214) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at edu.pitt.dbmi.causal.cmd.tetrad.TetradRunner.runSearch(TetradRunner.java:213) at edu.pitt.dbmi.causal.cmd.tetrad.TetradRunner.runAlgorithm(TetradRunner.java:130) at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.runTetrad(CausalCmdApplication.java:149) at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.main(CausalCmdApplication.java:113)

jdramsey commented 1 year ago

I'm curious why I can't reproduce the error. A couple of questions. Can you tell me what platform you're on and what version of Java you're using? Maybe there's a known issue

jdramsey commented 1 year ago

Another thing I could try to make that a ConcurrentHashMap instead of a HashMap...

jdramsey commented 1 year ago

I'll put this here to think about:

https://stackoverflow.com/questions/602636/why-is-a-concurrentmodificationexception-thrown-and-how-to-debug-it

jdramsey commented 1 year ago

@Zarmas also could you give me some hints as to how to reproduce the error? I mean aside from sending me your data?

Zarmas commented 1 year ago

I am using it on ubuntu. The java version of the error message is 15.0.2 but I also got it on java 18.0.2 and java 11.0.18. This error appears when I am trying to run a sample size of 150 or more. It runs with no problems with 100 samples.

Zarmas commented 1 year ago

While I ran it on 20000 variables and a low sample size with no problems, when I tried to run it on a smaller variable size but with a larger sample size and got the concurrentModificationException error. Seems that the large sample size is what is causing it.

Zarmas commented 1 year ago

Another issue I came across while running the fges-mb algorithm, is that changing the maxDegree parameter changed the results, even if the degree of the graph didn't reach the number I set. For example with max degree of 10 I got a graph with 4 nodes and 3 edges, and with max degree 20 I got a graph with 2 nodes and 1 edge. All the other parameters and target were the same.

jdramsey commented 1 year ago

@Zarmas Sorry, I'm still thinking about this last one. I don't think it would solve anything, but I should update the FGES-MB code to use all of the updates to FGES since the last time I did that. I'm in the middle of other things, but I'll get to that.

jdramsey commented 1 year ago

@zarmas I'm considering redoing the FGES-MB algorithm to use the latest FGES code. I may do that, not for the next release next week but for a subsequent release. But in light of your comments, I'm considering removing the max degree parameter from the code. Is that parameter helpful? If you set the parameter too low, you may impact the results, but it may be misleading if you don't know what to set it to.

Zarmas commented 1 year ago

I use the max degree parameter in order to make it easier to run with a lower penalty. For example I could run with penalty 4 only if I set maxDegree to 10. I understand that if there are more than 10 degrees associated with a node it may not show them, but it is curious to me why changing the max degree changes the results, with the same target and both results having less degrees than what was set in the parameter as said here: https://github.com/bd2kccd/causal-cmd/issues/74#issuecomment-1456027592