Closed chris-rands closed 4 years ago
Thanks for the kind words, sorry for the slightly delayed reply - I need to start regularly checking the email tied to my GitHub again.
BBKNN performs a KNN search for each batch individually, and then merges the resulting neighbour lists together. This parameter is the k
for that search, for each batch. The value of 3 stems from the fact that when computing the KNN for the batch a particular cell is from, the returned KNN will include the cell itself as one of the KNN regardless of the neighbour identification algorithm. As such, having fewer than two neighbours within a batch feels excessive. The value can be adjusted if desired, but is kept low as it tends to lead to better correction (as you noticed) while also improving run time.
Thanks for the nice tool! I'm trying to conceptually understand the
neighbors_within_batch
parameter. I read the docstring, but I'm still not clear exactly what this means? Is it 'k' whenapprox=True
? Setting this value higher leads to a more spread out UMAP (i.e. less correction), which may be preferable for some datasets? Is there a reason for the default value of3
?https://github.com/Teichlab/bbknn/blob/7e736d4eea36369b1ad426667eb1d7b90ad0fd9f/bbknn/__init__.py#L216-L218