immanuelazn commented 2 months ago

Description

Added function matrix_quantile_per_cell() to find the nth quantile value of each cell in a matrix. Allows for clipping using min_by_row() and min_by_col() Added pseudobulk_matrix() with option to clip by quantile, and to aggregate by non-zeros, mean, sum, variance.

Tests

For pseudobulk_matrix(), add tests for clipping, non-zeros, mean, sum, and variance in both transpose states.
For matrix_quantile_per_cell(), add test comparing against R quantile function across transpose states and quantile values

Details

I've iterated on pseudobulk_matrix() a few times as shown in commit history. I tried to be a little bit smarter by using matrix multiplies to utilize multi-threading. However, the solution for non-zeros and variance is probably non-optimal due to requiring the additional iterative functions.

These iterative functions are not multi-threaded, which makes me think I should have utilized a strategy like computeMatrixStats() in Concat{Cols,Rows}. I think a better way would be to create a child class inheriting from MatrixLoader that finds matrix subsets based off of cell grouping. Then utilize the default MatrixLoader<T>::computeMatrixStats() and manipulate it to fit the output matrix we're looking for. This would probably allow for use of threading, while also limiting the amount of duplicate code.

Follow-up checklist:

(Added during code review)

[ ] Move row/colQuantiles into a different file. Perhaps make a matrixStats.R file that collects all of the matrixStats interface functions
[ ] Split function documentation to have the matrixStats interfaces as a page, and then maybe split up the arithmetic functions into a few groups too

immanuelazn commented 2 months ago

Added in the following changes

R interrupt checking as well as removal of Rcpp code from physical logic has been finished for pseudobulk_matrix() and matrix_quantile_per_cell(). These functions now follow the same styling as other functions, with their own cpp and header files.
pseudobulk_matrix() calculates everything within a single pass, rather than separate passes for each aggregation method. Using inspiration from calculateMatrixStats()
pseudobulk_matrix() now returns a named list built from a struct, as we calculate the simpler aggregation methods during our pass anyways.
matrix_quantile_per_cell() has been restricted to usage in non transposed matrices. Should the user call this on a transposed matrix, they will be advised to call transpose_storage_order() first.
Simple typo fixes and docstring clarity changes

immanuelazn commented 1 month ago

change matrix_quantile_per_cell() to S3 generalizable colQuantile() function change colQuantile() to use type 7 quantile calculation change pseudobulk_matrix() to use a numeric clip_values representing quantile rather than boolean set to .99 change pseudobulk_matrix() to return a single matrix rather than a named list when only one method given remove matrix_quantile_per_row() fix problem with sum calculation in pseudobulk_matrix() when requesting more complex method various documentation changes

immanuelazn commented 1 month ago

Few notes:

Added rowQuantiles()
Change colQuantiles() to return a matrix if multiple probs are given
Change colQuantiles() to work with any of the continuous quantile algorithms (4-9)
Added tests for checking other types of quantiles, overall consolidation of quantile tests
Changed PseudobulkStatMethod to use bitlags
Changes pseudobulk_matrix() to not check the transpose conditional with every value check in the matrix. Also cleaned up the code and incorporated aforementioned bit flags

immanuelazn commented 1 month ago

I think I have addressed all comments, I also gave it a pass with manually checking everything. As for the two points that you have put up, I put them into an issuue within the projects page, for a new PR. Thanks for being so patient and detailed with your review Ben

immanuelazn commented 1 month ago

Changed quantile tests to use matrixStats functions, so we can ensure correct syntax and behaviour
Reverted PseudobulkStatsMethod bit flagging to no longer be cast as ints.
Transposed colQuantiles() to match matrixStats

bnprks / BPCells

[r][cpp] add pseudobulking to matrices #128

Description

Tests

Details

Follow-up checklist: