Closed richpauloo closed 3 years ago
Thanks for posting here. igraph
is supposed to work for graphs with 1MM+ vertices, but I do not know if those cases are typically relatively sparse graphs. brainGraph
, on the other hand, was originally "only intended for" graphs with up to ~1,000 vertices or so (at least, I never really considered the issue of very very large, dense graphs). It may be best to contact the igraph
folks for their thoughts.
It is true that it would be difficult to determine what the cutoff would be. I did a bit of quick testing on my computer (laptop, 4-core @2.5 GHz i7-6500U) and found that at around 650-675 vertices, the parallel solution started to become faster (at first, by only a few milliseconds or so). My solution at first glance would be to simply check the vertex count and have a switch that uses foreach
to loop across vertices, or stick with the function as-is. i.e.,
n <- vcount(g)
if (n > 650) {
D <- foreach(i=seq_len(n), .combine='rbind') %dopar% {
distances(g, from=i, weights=weights)
} else {
...
}
The current use.parallel
could jump to that automatically. Currently, I expect the user to set the number of cores and register the cluster at the start of the session (using code I give as examples in README.md
and the User Guide). An alternative would be to have a global option a la the boot
package (i.e., something like brainGraph.parallel
and brainGraph.ncpus
) and have code to use either parallel
or snow
. I would like to avoid this as it would lengthen every function using foreach
, but it is certainly doable.
I will think about this issue more and come up with a solution. I am not sure if it will make it into the next release (v3.0.0) which is coming soon.
As discussed in this Stack Overflow question,
brainGraph::efficiency()
usesigraph::distance
to calculate distance matrices.When the number of vertices is large, this can easily eat up all of the memory of a local machine.
A solution that @cwatson proposed on SO is to run the calculation in parallel for sufficiently large graphs. I'm unsure of where to draw this line.
Moreover, since even parallel calculations can easily eat up memory and take hours to solve, I want to add that the best way to approach this might be to throw an error that prompts the user to either:
parallel_efficiency()
) orTRUE
evaluates the problem via {doParallel}, and uses 1/2 the cores by default, or by the number of cores specified in acores
argument