Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
43 stars 18 forks source link

gaps and GRangesList #17

Open kasperdanielhansen opened 6 years ago

kasperdanielhansen commented 6 years ago

gaps() does not work on a GRangesList:

> gaps(grIntrons.24)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘gaps’ for signature ‘"CompressedGRangesList"’

I would expect it to work :) Running endoapply(grIntrons.24, gaps) takes forever.

malcook commented 5 years ago

FWIW, and to do more than simply +1 this issue, perhaps provoke a reaction, or help a fellow traveler, here is my workaround for same:

 grlGaps<-function(grl) {
    ## ALAS: [gaps does not work on a
    ## GRangesList](https://github.com/Bioconductor/GenomicRanges/issues/17)
    ##
    ## Here is quick-to-code but slow-to-run implementation:
    ## return(endoapply(grl,gaps,NA,NA))
    ## NB: the NA arguments are required to on include "internal"
    ## gaps.
    ## 
    ## By my timing, the below is about 175x faster than endoapply.
    ## It requires grl to be 'named' uniquely.  There may be some
    ## other ways its assumptions are not what you would expect by
    ## gaps on a GRangesList.
    stopifnot(0<length(names(grl)))
    stopifnot(0==anyDuplicated(names(grl)))
    r<-unlist(range(grl))
    girl<-gaps(ranges(grl),NA,NA) # gaps as an iranges list
    l<-lengths(girl)
    rdf<-as.data.frame(r)
    lkup<-rdf[names(girl),]
    res<-GRanges(rep(lkup$seqnames,l),unlist(girl),rep(lkup$strand,l))
    res<-split(unname(res),names(res))
    missing<-setdiff(names(grl),names(res))
    missing<-`names<-`(makeGRangesListFromFeatureFragments(missing
                                                          ,rep(list(NULL),length(missing))
                                                          ,rep(list(NULL),length(missing))),
                       missing)
    res<-c(res,missing)[names(grl)] # put them in the same order as grl.
    res
  }
lawremi commented 5 years ago

In most cases, it's often preferable to do something like getting the transcript ranges and then doing a psetdiff(tx, introns_or_exons).

malcook commented 5 years ago

Thanks @lawremi - even faster and cleaner your way, which method I had forgotten about. So, my function (if we still want a function at all) can be re-written

  grlGaps<-function(grl) {
    ## for discussion, c.f.  [gaps does not work on a GRangesList](https://github.com/Bioconductor/GenomicRanges/issues/17) .
   psetdiff(unlist(range(grl),use.names=FALSE),grl))
}