Address `pinv` performance

The numpy.linalg.pinv call is very expensive, as of commit 2601348ba5ab7c04c1ce69166640a3e5ee4e0f04, and takes effectively all of the run-time. See the de_wls_stats_pinv timing line below:

/usr/bin/time -v python -O -m memento.diff_expr "tissue_general_ontology_term_id in ['UBERON:0002405']" sex_ontology_term_id /mnt/census/estimators-cube-70a7705-float32/ 1 10
DEBUG:root:computing for 145 obs groups (49874 cells) and 10 features using 1 processes, 10 features/process
DEBUG:root:design shape: (145, 6231)
DEBUG:root:computing for feature group 0, n=10, ENSG00000183878..ENSG00000259707...
DEBUG:root:computing feature
DEBUG:root:pinv m.shape=(6231, 6231)
DEBUG:root:computing feature
DEBUG:root:pinv m.shape=(6231, 6231)
DEBUG:root:computed for feature group 0, ENSG00000183878..ENSG00000259707, result size=2
[timing 22904] compute_for_feature: cum_time=53.466328713009716 sec; avg_time=26.733; calls=2
[timing 22904] de_wls: cum_time=53.458166673008236 sec; avg_time=26.729; calls=2
[timing 22904] de_wls_stats: cum_time=53.35123460300383 sec; avg_time=26.676; calls=2
[timing 22904] de_wls_stats_pinv: cum_time=53.12332922501082 sec; avg_time=26.562; calls=2
[timing 22904] de_wls_stats_matmul: cum_time=0.22296729500521906 sec; avg_time=0.111; calls=2
[timing 22904] de_wls_fit: cum_time=0.10688159501296468 sec; avg_time=0.053; calls=2
[timing 22904] query_estimators: cum_time=0.10440379299689084 sec; avg_time=0.104; calls=1
[timing 22904] fill_missing_data: cum_time=0.005330480009433813 sec; avg_time=0.003; calls=2
[timing 22904] drop_invalid_data: cum_time=0.0005337570037227124 sec; avg_time=0.001; calls=1
[timing 22904] transform_to_log_space: cum_time=0.00011587899643927813 sec; avg_time=0.000; calls=2
[timing 22904] de_wls_stats_W: cum_time=4.28729981649667e-05 sec; avg_time=0.000; calls=2

The pinv takes a square matrix whose shape (dim size) is dependent upon the number of columns in the design/dummies matrix. And that column count, in turn, is a function of all the distinct values that exist across all of the covariate dimensions. So, for example, if cell type was a covariate, each cell type would add a column to design matrix, and thus to the matrix arg passed to pinv.

The current computation method, which relies upon pinv, may need to be replaced with a more efficient method of computation.

Note, however, that the matrix passed to pinv should be much smaller once for real use cases, where not all cube dimensions are desired as covariates. Currently, all dimensions are used as covariates, but this will be changed by #942. Once that is implemented, and the user can specify specific covariates and the m matrix should be much smaller, which will improve performance. In particular, pinv performance doesn’t scale linearly with the m shape (quadratic?), so performance may be reasonable for most use cases with limited covariates. Its probably worth implementing #942 first, and testing with "real world" (scientifically useful) queries.

chanzuckerberg / cellxgene-census

Address `pinv` performance #941