Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
41 stars 17 forks source link

subtract for two GRangesList #68

Closed gevro closed 1 year ago

gevro commented 1 year ago

Hi, Is it possible to modify subtract so it also works with two GRangesList, and it performs subtract on each element of the GRangesLists x and y that have the same name?

hpages commented 1 year ago

Hi,

So it would be something like this?

subtract_GRangesList <- function(x, y, minoverlap=1L, ignore.strand=FALSE)
{
    stopifnot(is(x, "GRangesList"), is(y, "GRangesList"))
    x_len <- length(x)
    x_names <- names(x)
    y_names <- names(y)
    if (is.null(x_names) || is.null(y_names)) {
        if (x_len != length(y))
            stop(wmsg("'x' and 'y' must have the same length when 'x' or 'y' has no names"))
        map <- seq_len(x_len)
    } else {
        map <- match(x_names, y_names)
    }
    lapply(setNames(seq_len(x_len), x_names),
        function(i) {
            gr1 <- x[[i]]
            j <- map[[i]]
            gr2 <- if (is.na(j)) GRanges() else y[[j]]
            subtract(gr1, gr2, minoverlap=minoverlap, ignore.strand=ignore.strand)
        })
}

Note that this returns a list of GRangesList objects, that is, an ordinary list (of the length of x) where each list element is a GRangesList object. The function won't be very efficient if x is a long GRangesList object e.g. if it has a length > a few thousands.

I'm not to keen on adding this to GenomicRanges, at least not in its current form. Feel free to copy/paste this function to your own code.

Best, H.

gevro commented 1 year ago

Thank you so much!