Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
41 stars 17 forks source link

nearestKNeighbors does not handle seqlevel pruning #74

Open yeyuan98 opened 1 year ago

yeyuan98 commented 1 year ago

In current version (commit https://github.com/Bioconductor/GenomicRanges/commit/d20afa47c9e7acbf213bbdc1ded043bd817f2458) nearestKNeighbors will fail if subject contains seqlevels not used by x, due to this line in .nearestKNeighbors:

seqlevels(subject) <- seqlevels(x)

Changing that line to the following should suffice presumably:

seqlevels(subject, pruning.mode = "tidy") <- seqlevels(x)

Would be more than happy to create a pull request or contribute in any way as needed. Thanks a lot for the great Bioconductor infrastructure packages.

yeyuan98 commented 1 year ago

Sorry, the above fix will introduce new errors because pruned subject will have different indices than the original subject, making the return values meaningless.

I guess the end user should prune the subject on their own before feeding the subject into this function.

hpages commented 1 year ago

@yeyuan98

I guess the end user should prune the subject on their own before feeding the subject into this function.

Not totally satisfactory. We need to take a close look but hopefully there is a better way to handle this. Thanks for reporting this.