Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
41 stars 17 forks source link

Some fixes I would like #1

Closed Roleren closed 6 years ago

Roleren commented 6 years ago

Two fixes I would like implemented.

First is that GenomicRanges::sort() for GRangesList should have an ignore.strand argument(just like it has for Granges object), so that you can sort following either GRanges standard (minus strand objects with highest start is first in group) and bed standard (- strand objects with highest start is last in group)

Second GenomicRanges::sort() for GRangesList also is waaay too slow. In my results a GRangesList of 76k transcripts, just sorting the output GRangesList is 99% of the time spent in the entire pipeline, that is not good.

This example sorts in < 1 second for a million GRanges groups by using data.table::order() : DT <- as.data.table(grl) asgrl <- makeGRangesListFromDataFrame( DT[order(group, start)],split.field = "group", names.field = "group_name", keep.extra.columns = T) names(asgrl) <- names(grl)

but of course, more dangerous I guess, example is just for "+" strands, "-" strands must use order(,decreasing = T)

hpages commented 6 years ago

Isn't one supposed to say he's been a good boy in his letter to Santa? And also to say "please" and "thank you"? This is implemented in GenomicRanges 1.31.2 (devel only).