corrpower takes too long

SystemsGenetics / KINC

Knowledge Independent Network Construction

MIT License

11 stars 4 forks source link

corrpower takes too long #126

Closed spficklin closed 4 years ago

spficklin commented 4 years ago

The corrpower filter, which removes edges that don't have sufficient power, is too slow. Its much much slower than conditional filtering. Perhaps a way to speed it up is to keep a lookup table of the parameters and when the exact combination has already been performed just use the lookup table rather than recompute.

bentsherman commented 4 years ago

@spficklin I took a quick look at the corrpower analytic and I think the poor performance has to do with how the analytic is iterating through the input data. It is iterating through every pairwise index the same way as similarity, and trying to read a gene pair for each index. This results in a lot of unnecessary work because the correlation matrix is very sparse.

If you look at other analytics such as export correlation matrix, extract, or conditional test you will see that these analytics iterate directly through the gene pairs in the input data, instead of testing every possible pairwise index. Of course this makes it harder to split up the work into uniform chunks but it seems you figured out how to do that with conditional test so I would refer to that analytic.

spficklin commented 4 years ago

Thanks @bentsherman , I'll try to adjust it and see if that fixes things.

bentsherman commented 4 years ago

I just pushed some changes that should resolve the issues you've been having with cond-test and corrpower. If you look at the code you will see that cond-test, corrpower, and similarity follow a similar pattern in terms of work blocks and result blocks.

Additionally, while similarity does iterate through every pairwise combination, cond-test and corrpower need only iterate through the pairs in the sparse CCM/CMX files. In this respect you will see that they follow a similar pattern as import-cmx, export-cmx, and extract.

So now both cond-test and corrpower should work with MPI and corrpower should be much faster. However I don't have the necessary input data to test them thoroughly so if you could please have someone test them (including MPI), and have them respond here with any issues they have. If everything works then I'll close out these issues.

spficklin commented 4 years ago

Okay. I'll test it. Thanks.

spficklin commented 4 years ago

This problem was fixed by the code that @4ctrl-alt-del added to PR #132 . But that PR got closed. I'm assuming this fix got moved into the adjustments made by @bentsherman.