brianhie / scanorama

Panoramic stitching of single cell data
http://scanorama.csail.mit.edu
MIT License
265 stars 49 forks source link

Much larger element cout in output matrix #126

Closed rvernica closed 2 years ago

rvernica commented 2 years ago

It look like one of the corrected output matrices has significantly more elements than the corresponding input matrix. Is this expected?

The input looks like this:

[<5811x23034 sparse matrix of type '<class 'numpy.float64'>'
    with 15076894 stored elements in Compressed Sparse Row format>,
 <5447x23953 sparse matrix of type '<class 'numpy.float64'>'
    with 19462707 stored elements in Compressed Sparse Row format>]

scanorama.correct log is this:

Found 21890 genes among all datasets
[[0.         0.72021296]
 [0.         0.        ]]
Processing datasets (0, 1)

The corrected output looks like this:

[<5811x21890 sparse matrix of type '<class 'numpy.float64'>'
    with 15074955 stored elements in Compressed Sparse Row format>,
 <5447x21890 sparse matrix of type '<class 'numpy.float64'>'
    with 118935312 stored elements in Compressed Sparse Row format>]

So, the inputs are 15M and 19M elements. The outputs are 15M and 118M. Why is the second output so much larger than the input? Is this expected?


I re-tried on a subset of these datasets and got the same pattern but now the first output matrix has significantly more elements: Input:

[<321x421 sparse matrix of type '<class 'numpy.float64'>'
    with 20662 stored elements in Compressed Sparse Row format>,
 <659x486 sparse matrix of type '<class 'numpy.float64'>'
    with 61978 stored elements in Compressed Sparse Row format>]

scanorama.correct log:

Found 405 genes among all datasets
[[0.         0.87227414]
 [0.         0.        ]]
Processing datasets (0, 1)

Output:

[<321x405 sparse matrix of type '<class 'numpy.float64'>'
    with 129365 stored elements in Compressed Sparse Row format>,
 <659x405 sparse matrix of type '<class 'numpy.float64'>'
    with 61755 stored elements in Compressed Sparse Row format>]

Notice the first input matrix has 20K elements while the first output matrix has 129K elements.