Teichlab / bbknn

Batch balanced KNN
MIT License
150 stars 25 forks source link

IndexError: index 2 is out of bounds for axis 0 with size 2 #32

Closed brianpenghe closed 3 years ago

brianpenghe commented 3 years ago

I have a dataset that includes one batch containing only 2 cells. The command was sc.external.pp.bbknn(bdata,batch_key = "batch",approx=False) It seems that bbknn doesn't like that small batch and gave me this error:

computing batch balanced neighbors
WARNING: unrecognised metric for type of neighbor calculation, switching to euclidean

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-274-086c83c814fc> in <module>
----> 1 sc.external.pp.bbknn(bdata,batch_key = "batch",approx=False)

/usr/local/lib/python3.6/dist-packages/scanpy/external/pp/_bbknn.py in bbknn(adata, batch_key, approx, metric, copy, n_pcs, trim, n_trees, use_faiss, set_op_mix_ratio, local_connectivity, **kwargs)
    118         set_op_mix_ratio=set_op_mix_ratio,
    119         local_connectivity=local_connectivity,
--> 120         **kwargs,
    121     )

/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in bbknn(adata, batch_key, use_rep, approx, metric, copy, **kwargs)
    289         #call BBKNN proper
    290     bbknn_out = bbknn_pca_matrix(pca=pca, batch_list=batch_list,
--> 291                                  approx=approx, metric=metric, **kwargs)
    292         #store the parameters in .uns['neighbors']['params'], add use_rep and batch_key
    293         adata.uns['neighbors'] = {}

/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in bbknn_pca_matrix(pca, batch_list, neighbors_within_batch, n_pcs, trim, approx, n_trees, use_faiss, metric, set_op_mix_ratio, local_connectivity)
    346     knn_distances, knn_indices = get_graph(pca=pca,batch_list=batch_list,n_pcs=n_pcs,n_trees=n_trees,
    347                                                                                    approx=approx,metric=metric,use_faiss=use_faiss,
--> 348                                            neighbors_within_batch=neighbors_within_batch)
    349         #sort the neighbours so that they're actually in order from closest to furthest
    350         newidx = np.argsort(knn_distances,axis=1)

/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in get_graph(pca, batch_list, neighbors_within_batch, n_pcs, approx, metric, use_faiss, n_trees)
    171                         for i in range(ckdout[1].shape[0]):
    172                                 for j in range(ckdout[1].shape[1]):
--> 173                                         ckdout[1][i,j] = ind_to[ckdout[1][i,j]]
    174                         #save the results within the appropriate rows and columns of the structures
    175                         col_range = np.arange(to_ind*neighbors_within_batch, (to_ind+1)*neighbors_within_batch)

IndexError: index 2 is out of bounds for axis 0 with size 2

Is that really due to the fact that one batch contains only two cells?

ktpolanski commented 3 years ago

Yep. I encountered this issue once upon a time when a labmate was doing some automated subsetting/integration and didn't notice a tiny two-cell batch, but I never acted on it as I figured it would be extremely unlikely to come up in actual use. I'll have a think how to best preempt this, probably some sort of cell count versus expected neighbour total.

ktpolanski commented 3 years ago

Fixed in 1.4.0