It look like one of the corrected output matrices has significantly more elements than the corresponding input matrix. Is this expected?
The input looks like this:
[<5811x23034 sparse matrix of type '<class 'numpy.float64'>'
with 15076894 stored elements in Compressed Sparse Row format>,
<5447x23953 sparse matrix of type '<class 'numpy.float64'>'
with 19462707 stored elements in Compressed Sparse Row format>]
scanorama.correct log is this:
Found 21890 genes among all datasets
[[0. 0.72021296]
[0. 0. ]]
Processing datasets (0, 1)
The corrected output looks like this:
[<5811x21890 sparse matrix of type '<class 'numpy.float64'>'
with 15074955 stored elements in Compressed Sparse Row format>,
<5447x21890 sparse matrix of type '<class 'numpy.float64'>'
with 118935312 stored elements in Compressed Sparse Row format>]
So, the inputs are 15M and 19M elements. The outputs are 15M and 118M. Why is the second output so much larger than the input? Is this expected?
I re-tried on a subset of these datasets and got the same pattern but now the first output matrix has significantly more elements:
Input:
[<321x421 sparse matrix of type '<class 'numpy.float64'>'
with 20662 stored elements in Compressed Sparse Row format>,
<659x486 sparse matrix of type '<class 'numpy.float64'>'
with 61978 stored elements in Compressed Sparse Row format>]
scanorama.correct log:
Found 405 genes among all datasets
[[0. 0.87227414]
[0. 0. ]]
Processing datasets (0, 1)
Output:
[<321x405 sparse matrix of type '<class 'numpy.float64'>'
with 129365 stored elements in Compressed Sparse Row format>,
<659x405 sparse matrix of type '<class 'numpy.float64'>'
with 61755 stored elements in Compressed Sparse Row format>]
Notice the first input matrix has 20K elements while the first output matrix has 129K elements.
It look like one of the corrected output matrices has significantly more elements than the corresponding input matrix. Is this expected?
The input looks like this:
scanorama.correct
log is this:The corrected output looks like this:
So, the inputs are 15M and 19M elements. The outputs are 15M and 118M. Why is the second output so much larger than the input? Is this expected?
I re-tried on a subset of these datasets and got the same pattern but now the first output matrix has significantly more elements: Input:
scanorama.correct
log:Output:
Notice the first input matrix has 20K elements while the first output matrix has 129K elements.