bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

Bug when trying to index IterableMatrix #11

Closed Gesmira closed 1 year ago

Gesmira commented 1 year ago

Hi @bnprks, thanks in advance for continuing to maintain such a useful package!

I've been running into an issue with a bit of an edge case recently when trying to index an IterableMatrix with a list of 0 features. I do this, for example, when trying to get the percentage of cells that express mitochondrial features. Doing this with a sparse matrix would return a matrix with no rows, and although the IterableMatrix output seems to suggest that is what is happening, doing some downstream functions on the object shows this is not the case.

For an example: I create a test IterableMatrix w 20 rows and 20 cells

> test
20 x 20 IterableMatrix object with class MatrixSubset

Row names: MIR1302-2HG, FAM138A ... RNF223
Col names: cell1, cell2 ... cell20

Data type: float
Storage order: column major

Queued Operations:
1. Load compressed matrix from directory ~/test
2. Select rows: 1, 2 ... 37 and cols: 1, 2 ... 20
> as(test, "dgCMatrix")
20 x 20 sparse Matrix of class "dgCMatrix"
  [[ suppressing 20 column names 'cell1', 'cell2', 'cell3' ... ]]

MIR1302-2HG .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
FAM138A     .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
OR4F5       .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
OR4F29      .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
OR4F16      .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
LINC01409   .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
FAM87B      .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
LINC01128   .        .        . .        .        .        . .        . . . . . 1.021765 .         . . . .        .        
LINC00115   .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
FAM41C      .        .        . .        .        0.784961 . .        . . . . . .        .         . . . .        .        
LINC02593   .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
SAMD11      .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
NOC2L       1.512336 .        . 1.792176 1.421819 .        . 1.216483 . . . . . .        0.9432759 . . . .        0.8852077
KLHL17      .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
PLEKHN1     .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
PERM1       .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
HES4        .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        
ISG15       .        1.404183 . 2.398350 3.264126 .        . .        . . . . . 1.021765 0.9432759 . . . 2.555951 .        
AGRN        .        .        . .        1.421819 .        . .        . . . . . .        .         . . . .        .        
RNF223      .        .        . .        .        .        . .        . . . . . .        .         . . . .        .        

Subsetting the matrix with 0 features seems to correctly give a 0 x 20 matrix. However, if I get the colSums of the matrix its greater than 0 and converting this subsetted matrix to a sparse matrix leads to a strange result.

> features <- grep("^MT-", rownames(test), value = T)
> features
character(0)
> test[features,]
0 x 20 IterableMatrix object with class MatrixSubset

Row names: unknown names
Col names: cell1, cell2 ... cell20

Data type: float
Storage order: column major

Queued Operations:
1. Load compressed matrix from directory ~/test
2. Select rows: all and cols: 1, 2 ... 20
> colSums(test[features,])
   cell1    cell2    cell3    cell4    cell5    cell6    cell7    cell8    cell9   cell10   cell11   cell12   cell13   cell14   cell15 
2203.956 2367.250 1867.033 1742.972 2418.026 2665.404 1226.463 1975.834 2367.414 2239.525 2029.724 1633.182 1250.033 2427.025 2412.731 
  cell16   cell17   cell18   cell19   cell20 
2366.497 2084.104 1745.754 1774.932 2716.428 

> as(test[features,], "dgCMatrix")
37412 x 20 sparse Matrix of class "dgCMatrix"
  [[ suppressing 20 column names 'cell1', 'cell2', 'cell3' ... ]]
  [[ suppressing 20 column names 'cell1', 'cell2', 'cell3' ... ]]

<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . 1.021765 . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . 0.784961 . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .
<NA> . . . . . .        . . . . . . . .        . . . . . .

 ..............................
 ........suppressing 37362 rows in show(); maybe adjust 'options(max.print= *, width = *)'
 ..............................
  [[ suppressing 20 column names 'cell1', 'cell2', 'cell3' ... ]]

<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . 1.021765 . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .
<NA> . . . . . . . . . . . . . .        . . . . . .

Indexing with 1 or greater features works fine, this just occurs with 0. Thanks for any advice or solutions you may have!

Best, Gesi

bnprks commented 1 year ago

Interesting corner case -- thanks for the clear bug report. This should be fixed now in 71d2de704fa97d61a6c0e1c4690e3c9217957f37.