Closed immanuelazn closed 1 month ago
Added in the following changes
pseudobulk_matrix()
and matrix_quantile_per_cell()
. These functions now follow the same styling as other functions, with their own cpp and header files.pseudobulk_matrix()
calculates everything within a single pass, rather than separate passes for each aggregation method. Using inspiration from calculateMatrixStats()
pseudobulk_matrix()
now returns a named list built from a struct, as we calculate the simpler aggregation methods during our pass anyways.matrix_quantile_per_cell()
has been restricted to usage in non transposed matrices. Should the user call this on a transposed matrix, they will be advised to call transpose_storage_order()
first.change matrix_quantile_per_cell()
to S3 generalizable colQuantile()
function
change colQuantile()
to use type 7 quantile calculation
change pseudobulk_matrix()
to use a numeric clip_values representing quantile rather than boolean set to .99
change pseudobulk_matrix()
to return a single matrix rather than a named list when only one method given
remove matrix_quantile_per_row()
fix problem with sum calculation in pseudobulk_matrix()
when requesting more complex method
various documentation changes
Few notes:
rowQuantiles()
colQuantiles()
to return a matrix if multiple probs are givencolQuantiles()
to work with any of the continuous quantile algorithms (4-9)PseudobulkStatMethod
to use bitlagspseudobulk_matrix()
to not check the transpose conditional with every value check in the matrix. Also cleaned up the code and incorporated aforementioned bit flagsI think I have addressed all comments, I also gave it a pass with manually checking everything. As for the two points that you have put up, I put them into an issuue within the projects page, for a new PR. Thanks for being so patient and detailed with your review Ben
matrixStats
functions, so we can ensure correct syntax and behaviourPseudobulkStatsMethod
bit flagging to no longer be cast as ints.colQuantiles()
to match matrixStats
Description
Added function
matrix_quantile_per_cell()
to find the nth quantile value of each cell in a matrix. Allows for clipping usingmin_by_row()
andmin_by_col()
Addedpseudobulk_matrix()
with option to clip by quantile, and to aggregate by non-zeros, mean, sum, variance.Tests
pseudobulk_matrix()
, add tests for clipping, non-zeros, mean, sum, and variance in both transpose states.matrix_quantile_per_cell()
, add test comparing against R quantile function across transpose states and quantile valuesDetails
I've iterated on
pseudobulk_matrix()
a few times as shown in commit history. I tried to be a little bit smarter by using matrix multiplies to utilize multi-threading. However, the solution for non-zeros and variance is probably non-optimal due to requiring the additional iterative functions.These iterative functions are not multi-threaded, which makes me think I should have utilized a strategy like
computeMatrixStats()
inConcat{Cols,Rows}
. I think a better way would be to create a child class inheriting fromMatrixLoader
that finds matrix subsets based off of cell grouping. Then utilize the defaultMatrixLoader<T>::computeMatrixStats()
and manipulate it to fit the output matrix we're looking for. This would probably allow for use of threading, while also limiting the amount of duplicate code.Follow-up checklist:
(Added during code review)
matrixStats.R
file that collects all of the matrixStats interface functionsmatrixStats
interfaces as a page, and then maybe split up the arithmetic functions into a few groups too