Computational-Morphogenomics-Group / MarkerMap

Marker selection, supervised and unsupervised
MIT License
5 stars 1 forks source link

Error sparse array length is ambiguous; use getnnz() or shape[0] #54

Closed Rafael-Silva-Oliveira closed 5 months ago

Rafael-Silva-Oliveira commented 6 months ago

When trying to use the first function to split the data I get this error:

---> [12](vscode-notebook-cell:?execution_count=33&line=12) train_indices, val_indices, test_indices = split_data(
     [13](vscode-notebook-cell:?execution_count=33&line=13)     adata.X,
     [14](vscode-notebook-cell:?execution_count=33&line=14)     adata.obs[group_by],
     [15](vscode-notebook-cell:?execution_count=33&line=15)     [0.7, 0.1, 0.2],
     [16](vscode-notebook-cell:?execution_count=33&line=16) )
     [17](vscode-notebook-cell:?execution_count=33&line=17) train_val_indices = np.concatenate([train_indices, val_indices])
     [19](vscode-notebook-cell:?execution_count=33&line=19) train_dataloader, val_dataloader = MarkerMap.prepareData(
     [20](vscode-notebook-cell:?execution_count=33&line=20)     adata,
     [21](vscode-notebook-cell:?execution_count=33&line=21)     train_indices,
   (...)
     [25](vscode-notebook-cell:?execution_count=33&line=25)     batch_size=batch_size,
     [26](vscode-notebook-cell:?execution_count=33&line=26) )

[1150](That set of data must have at least that number of representatives of each group in y

-> [1153](/site-packages/markermap/utils.py:1153) assert len(X) == len(y)

TypeError: sparse array length is ambiguous; use getnnz() or shape[0]

My adata.X is:


<1786x33567 sparse matrix of type '<class 'numpy.float32'>'
    with 1869211 stored elements in Compressed Sparse Row format>

Already tried changing todense and toarray and didn't work

WilsonGregory commented 6 months ago

Hi Rafael, Hmm, this does look like a sparse array issue. How did you try to converting it to a dense array? Take a look at the mouse brain loading function for how we do it, https://github.com/Computational-Morphogenomics-Group/MarkerMap/blob/main/src/markermap/utils.py#L1012

So it would be something like load it into adata, then do adata.X = adata.X.toarray()

This is a common issue though, so I will add some documentation and maybe a helpful error message.