parallelization doesn't seem to work properly

tzcoolman commented 8 years ago

Hi Joe,

This is Enze from CCD summer short course 2016. We've been using causal-cmd to general gene regulatory network with our expression data. But currently we are having trouble of generating results. Our dataset is a 100*15000 matrix with continuous numbers. And currently we cannot get results in 72 hrs (the program stopped due to our job runtime limits). And when we run it again, by monitoring the process with top command, it seems that most of the time only one core is involved in the whole process even though we clearly specified "--thread 16". Does it sound normal?

PS: Here is parameters info in the head of the result file:

Runtime Parameters: verbose = true number of threads = 16

Dataset: file = test.input delimiter = tab cases read in = 14206 variables read in = 107

FGS Parameters: penalty discount = 3.000000 depth = -1

Run Options: heuristic speedup = true ignore linear dependence = false

Data Validations: ensure variable names are unique = true ensure variables have non-zero variance = true

Regards Enze

jdramsey commented 8 years ago

What I would do is try a much higher penalty discount. Try 100, for instance, just to see if it finishes. Then lower the penalty discount to the point where it continues to finish in reasonable time. I'm guessing the issue is that the graph is so dense that it bogs down the algorithm. Raising the penalty discount should help to clear out the graph.

jdramsey commented 8 years ago

I'm going to close this to push it out of the list for the time being. If you're still having trouble please reopen it and add some comments.

cmu-phil / tetrad

parallelization doesn't seem to work properly #267