cmap / cmapPy

Assorted tools for interacting with .gct, .gctx files and other Connectivity Map (Broad Institute) data/tools
https://clue.io/cmapPy/index.html
BSD 3-Clause "New" or "Revised" License
126 stars 76 forks source link

math.tests/fast_cov and fast_corr: added methods for calculating when nan's are present #59

Closed dllahr closed 5 years ago

dllahr commented 5 years ago

Added methods to calculate covariance and correlations when there are nan values present - skipping the nans. Uses linear algebra and other numpy methods (instead of loops) to try to keep it as fast as possible.

levlitichev commented 5 years ago

This is a great addition! The tests failed because it looks like there's a stray pdb in test_fast_corr.py.

dllahr commented 5 years ago

Thanks Lev! Should be fixed now. In other news, I used it to run a spearman calculation between ~17k vectors. When I did with a script to loop over them, it took ~10 days. I used nan_fast_spearman today and did it in less than an hour.

87% of the values have less than 10% discrepancy between the calculations. image

(edited to use fractional difference instead of absolute difference)