cmap / cmapPy

Assorted tools for interacting with .gct, .gctx files and other Connectivity Map (Broad Institute) data/tools
https://clue.io/cmapPy/index.html
BSD 3-Clause "New" or "Revised" License
124 stars 74 forks source link

math/fast_cov.py and math/fast_corr.py: add math routines to calculate covariance and correlation matrices (the latter uses the former) #34

Closed dllahr closed 6 years ago

dllahr commented 6 years ago

following in the pattern of mortar, added routines to calculate covariance and correlation. Added the covariance routine since that is the workhorse for calculating correlation, and might as well provide it separately. Although numpy (and other libs) provide covariance and correlation calculations, they do not easily provide a way to calculate it across 2 separate matrices/arrays of numbers except by just concatenating the two matrices together and then calculating the results for all. This can be unnecessarily time consuming if for example you have a single vector that you want to correlate against a large matrix.

oena commented 6 years ago

@dllahr thanks for this! Added some comments where I had a couple questions.

dllahr commented 6 years ago

@oena thanks Oana for looking at this so quickly!

levlitichev commented 6 years ago

Just remembered one more thing. Does this handle missing values at all?

dllahr commented 6 years ago

@levlitichev The alternative to the sketchiness would be to invert the axis argument. I don't think that is going away, one way or another, but I do agree that the axis keyword/convention needs to be changed.

It does not handle missing values at all; that's one of the things that makes it fast! We could add some pre-processing methods to handle those and then call fast_corr

levlitichev commented 6 years ago

Mm, having support for missing values is definitely on my wish-list, but agree elsewhere is better.

dllahr commented 6 years ago

Thank you @oena @levlitichev for the feedback, I've made the suggested changes and pushed updates to github, including adding fast_spearman - let me know what you think!