Closed m-swirski closed 4 years ago
This function was made before split became a fast function to make GRL, and I never had a problem with names not following the order, but I see this is a problem, will update to make it a simpler wrap of split. The main reason is still speed, split is still slow if all GR objects are called ENST10000000101 etc, so that is why a split on integers is so much faster, but will make it work for recurring group members.
Fixed in latest version on github. Will now push the changes I have now to devel, thanks for all the feedback. All bugs are now fixed I think, and I will look at your remaining enchantments requests.
re-Open issue if there is anything else here.
the function is based on transforming names (or other argument) to Rle object and using runLength() function to get grouping factor based on run lengths. It would work perfectly if granges objects were sorted by names, but they usually are sorted by coordinate, so it can easily happen if you have, for example, a nested gene within an intron. consider this snippet:
library(ORFik) gr <- GRanges(seqnames = "chrI", IRanges(start = c(10, 50, 100, 200), end = c(20,60,110,210)), strand = factor("+", levels = c("+", "-", "*"))) names(gr) <- c("a", "a", "b", "a") gr_grouped <- groupGRangesBy(gr) gr_split <- split(gr, names(gr)) identical(gr_grouped, gr_split) names(gr_grouped)
NAs are introduced without warning.
I made a wrapper to get groupGRangesBy work properly:
groupGRangesBy_fix <- function(gr, other = NULL) { if (is.null(names(gr)) & is.null(other)) stop(wmsg("gr object has no names")) if (!is.null(names(gr))) { names <- names(gr) } else names <- other names_uniq <- unique(names) reorder <- order(match(names, names_uniq)) gr <- gr[reorder] if (!(is.null(other))) other <- other[reorder] groupGRangesBy(gr, other=other) }
gr_grouped2 <- groupGRangesBy_fix(gr) identical(gr_grouped2, gr_split)
so it works, but in fact why not just wrap split() function to a more convenient form with warning display etc?