Open hpages opened 1 month ago
Thanks a lot @hpages, this is really great! We have started the transition in the use_NAmatrix
branch. Many things already work nicely, including the construction of the SummarizedExperiment
objects with the DataFrame
of NaArray
s as the assay. As you mentioned, having row
operations would be great, and we're also currently getting an error when attempting to apply logical operations (e.g. here). But already very exciting! We'll keep you posted here about our progress.
rowSums(<NaArray>)
, <NaArray> >= <some value>
, and <NaArray> != <some value>
added to SparseArray >= 1.5.34
So with this latest version you should be able to do rowSums(y >= 0.5)
and rowSums(y != 0)
on your NaMatrix object y
. Note that you probably want to use na.rm=TRUE
to keep the (implicit) NAs out of the operation.
Let me know what other row*()
methods you'll need, if any (I only implemented rowSums()
for now). It'll help me prioritize.
Very cool, thanks! This seems to work as expected. I think from our side, if we can make a wish for the next step 🙂, it would be nnavals
(which you mention above is on the todo list) - I just ran through the unit tests and that seems to be the missing piece.
nnavals()
implemented in SparseArray 1.5.39. Let me know if there are other things you need for footprintR.
Thank you! 🤩
@csoneson @mbstadler Let's use this issue to discuss the migration from SparseMatrix to NaMatrix.
Some background
NaArray and NaMatrix objects are still under development in the SparseArray package. They are similar to SparseArray objects but the background value is NA instead of zero. I started to work on these objects at the beginning of August before going on vacation, had to put them on the back burner for most of August, and just resumed my work on these objects yesterday with the implementation of a bunch of statistical methods. See https://github.com/Bioconductor/SparseArray/commit/185dc371b7c2418e268d9e63a5a65e3deaadb949
Some important features are still missing e.g. there's still no row summarization methods like
rowSums()
,rowVars()
etc... only thecol*()
methods for now. I'll add therow*()
methods later today (this will be in SparseArray 1.5.33). The package also needs a short vignette dedicated to NaArray and NaMatrix objects.Note that the
nz*()
functions (nzcount()
,nzwhich()
, andnzvals()
) that are available for SparseArray objects are replaced withnna*()
functionsnnacount()
,nnawhich()
, andnnavals()
(nnavals()
not ready yet). This is because thenz
prefix -- which stands for "nonzero" -- is not adequate for NaArray objects, so in the new names it's replaced with thenna
prefix which stands for "non-na". I'm not a big fan of the double negation here ("non not available") so if you can think of something better please let me know:Two easy ways to construct an NaMatrix or NaArray object are:
as(, "NaArray")
or construct an empty object like the above and add the non-na values to it with something like:
midx <- cbind(c(1:5, 5:1), 2:11) nam[midx] <- runif(10)
nnacount(nam)
[1] 10
nnawhich(nam)
[1] 6 12 18 24 30 35 39 43 47 51
nnawhich(nam, arr.ind=TRUE)
[,1] [,2]
[1,] 1 2
[2,] 2 3
[3,] 3 4
[4,] 4 5
[5,] 5 6
[6,] 5 7
[7,] 4 8
[8,] 3 9
[9,] 2 10
[10,] 1 11
In particular, please note that coercion back and forth between dgCMatrix and NaMatrix is not supported because such coercion doesn't really make sense as it would need to replace 0's with NA's, and vice-versa, in order to preserve the sparse representation. I don't think this would be a great behavior. So to go back and forth between dgCMatrix and NaMatrix, one will need to do something like this:
This might go in dedicated functions.
Anyways, you should not need the intermediate dgCMatrix representation anymore in
footprintR::readModkitExtract()
if you replace the SparseMatrix object with an NaMatrix object.The footprintR use case
Right now the
readModkitExtract()
function in footprintR constructs a SparseMatrix object where non-available values are represented with zeros, and zeros are represented with the value 0.02. This is achieved thru the combination of this line and these lines.An NaMatrix object would be more natural.
To construct the NaMatrix, replace lines 219-225:
with the following lines:
Also this line (line 166) would need to be removed:
And this line (line 218):
would need to be replaced with:
The reason for this replacement is to preserve the original behavior which drops rows for which
call_code
is not"-"
andmod_prob
is zero.Finally imports would need to be adjusted e.g. the following line at the top of the
R/readModkitExtract.R
file:would need to be replaced with:
and the
#' @importFrom Matrix sparseMatrix
line would need to be removed. Also the doc of the function would need to be modified to reflect these changes.Feedback welcome. Dont hesitate to use the issue tracker in the SparseArray repo to report bugs or request features.
Let me know if you have questions.