Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
41 stars 17 forks source link

Added a findOverlap method for character vectors. #60

Open LTLA opened 2 years ago

LTLA commented 2 years ago

Closes #52 with the following syntactic sugar:

example(GenomicRanges, echo=FALSE)
findOverlaps(gr, "chr1")
## Hits object with 2 hits and 0 metadata columns:
##       queryHits subjectHits
##       <integer>   <integer>
##   [1]         5           1
##   [2]         6           1
##   -------
##   queryLength: 10 / subjectLength: 1

Also works for GRLs:

example(GRangesList, echo=FALSE)
findOverlaps(grl, c("Chrom1"))
## Hits object with 2 hits and 0 metadata columns:
##            from        to
##       <integer> <integer>
##   [1]         2         1
##   [2]         3         1
##   -------
##   nLnode: 3 / nRnode: 1

findOverlaps(grl, c("Chrom1"), type="within")
## Hits object with 1 hit and 0 metadata columns:
##            from        to
##       <integer> <integer>
##   [1]         2         1
##   -------
##   nLnode: 3 / nRnode: 1

maxgap and minoverlap are currently ignored completely. type is mostly ignored except for the GRL methods, which require all ranges in a GRL element to lie within the sequence name to consider an overlap with that name.

Note that overlapsAny and friends don't yet work out of the box; I'm guessing a Vector_OR_vector signature needs to be added to the methods so that it can pass along the character vectors properly.

lawremi commented 2 years ago

Would this also handle character representation of ranges, like "chr1:1-1000"?

LTLA commented 2 years ago

Not as implemented. An alternative approach would to coerce character vectors to a GRanges via the usual constructor; this would be more elegant, but would require the extra intelligence of being able to do something like GRanges("chrA") and give either chrA:1-MAXINT (if the seqinfo() is not supplied) or chrA:1-len (if the seqinfo() is supplied).