bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

[r][cpp] add pseudobulking to matrices #128

Closed immanuelazn closed 1 month ago

immanuelazn commented 2 months ago

Description

Added function matrix_quantile_per_cell() to find the nth quantile value of each cell in a matrix. Allows for clipping using min_by_row() and min_by_col() Added pseudobulk_matrix() with option to clip by quantile, and to aggregate by non-zeros, mean, sum, variance.

Tests

Details

I've iterated on pseudobulk_matrix() a few times as shown in commit history. I tried to be a little bit smarter by using matrix multiplies to utilize multi-threading. However, the solution for non-zeros and variance is probably non-optimal due to requiring the additional iterative functions.

These iterative functions are not multi-threaded, which makes me think I should have utilized a strategy like computeMatrixStats() in Concat{Cols,Rows}. I think a better way would be to create a child class inheriting from MatrixLoader that finds matrix subsets based off of cell grouping. Then utilize the default MatrixLoader<T>::computeMatrixStats() and manipulate it to fit the output matrix we're looking for. This would probably allow for use of threading, while also limiting the amount of duplicate code.

Follow-up checklist:

(Added during code review)

immanuelazn commented 2 months ago

Added in the following changes

immanuelazn commented 1 month ago

change matrix_quantile_per_cell() to S3 generalizable colQuantile() function change colQuantile() to use type 7 quantile calculation change pseudobulk_matrix() to use a numeric clip_values representing quantile rather than boolean set to .99 change pseudobulk_matrix() to return a single matrix rather than a named list when only one method given remove matrix_quantile_per_row() fix problem with sum calculation in pseudobulk_matrix() when requesting more complex method various documentation changes

immanuelazn commented 1 month ago

Few notes:

immanuelazn commented 1 month ago

I think I have addressed all comments, I also gave it a pass with manually checking everything. As for the two points that you have put up, I put them into an issuue within the projects page, for a new PR. Thanks for being so patient and detailed with your review Ben

immanuelazn commented 1 month ago