Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
41 stars 17 forks source link

overlaps with character arguments as seqlevels? #52

Open LTLA opened 3 years ago

LTLA commented 3 years ago

I was recently in the situation where I needed to write some code that would detect whether a GRanges or GRangesList contained elements on a particular chromosome. Well, no problem, I'll just look at the seqnames:

example(GenomicRanges, echo=FALSE)
as.logical(seqnames(gr) %in% "chr1")
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE

So far so good. But then I realized that the same approach would not work properly for GRangesLists:

set.seed(10)
grl <- split(gr, sample(3, length(gr), replace=TRUE))
seqnames(grl) %in% "chr1"
## RleList of length 3
## $`1`
## logical-Rle of length 1 with 1 run
##   Lengths:     1
##   Values : FALSE
## 
## $`2`
## logical-Rle of length 2 with 2 runs
##   Lengths:     1     1
##   Values : FALSE  TRUE
## 
## $`3`
## logical-Rle of length 7 with 3 runs
##   Lengths:     2     1     4
##   Values : FALSE  TRUE FALSE

Which breaks the GRanges* abstraction that I was hoping to use. As such, I need to write GRanges and GRangesList-specific code to check whether the entries contain any intervals in my desired chromosome - not great.

However, it occurred to me that an elegant solution would be to repurpose overlapsAny(), which always returns a logical vector. To wit, the following gives me the desired result for both objects:

chr1 <- GRanges("chr1:1-1000")
overlapsAny(gr, chr1)
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
overlapsAny(grl, chr1)
## [1] FALSE  TRUE  TRUE

The above is not quite perfect as it still requires us to construct chr1, which requires knowledge of the range of entries on chromosome 1. A user-friendlier version of the above would allow us to just do:

overlapsAny(gr, "chr1")
overlapsAny(grl, "chr1")

To achieve the same effect. This would simply require new methods for GRanges(List),character, with the understanding that all character arguments are interpreted as seqlevels by the GenomicRanges overlap infrastructure.