BIMSBbioinfo / genomation

R package for genomic feature analysis and visualization
http://bioinformatics.mdc-berlin.de/genomation/
73 stars 22 forks source link

[Feature request] Keep all windows in the scoreMatrix #197

Open balwierz opened 3 years ago

balwierz commented 3 years ago

Currently ScoreMatrix does not keep all windows in the output. It removes all windows which overlap exactly 0 features in target. This causes two research problems:

Personally, I cannot imagine a situation where removing windows would be beneficial. No overlap is a valid research result. Zero signal lies on the continuum next to very low signal (which is included).

frenkiboy commented 3 years ago

Dear Piotr,

Rownames of the scorematrix should correspond to the windows which were kept during the overlap.

You can also get around this by setting up the seqlevels (from one of your previous examples): seqlengths(gr3) = seqlengths(gr4) = 10

Cheers,

v

balwierz commented 3 years ago

Hi,

Thanks, I know about the rownames and I am using them. I also always fix seqInfo of any GRanges object I create in R. I started this not for myself, but rather for other users. As I know some people never assign seqInfo.

I am currently using a slightly slower, high-level variation on ScoreMatrix to keep all the windows and keep window:target orientation pairs only (https://github.com/BIMSBbioinfo/genomation/issues/108)

ScoreMatrixPiotr <- function(target, windows, ignore.strand=FALSE)
{
    stopifnot(max(width(windows)) == min(width(windows)))
    summariseOneRow <- function(target, window)
    {
        as.numeric(append(coverage(ranges(target)), Rle(0L, 1000000000))[ranges(window)])
    }
    ret <- matrix(0, ncol = length(windows), nrow=width(windows[1])) # this will be transposed
    o <- findOverlaps(target, windows, ignore.strand=ignore.strand) %>%
        as_tibble() %>%
        group_by(subjectHits) %>%
        summarise(data=list(summariseOneRow(target[queryHits], windows[subjectHits[1]])))
    rowI <- pull(o, subjectHits)
    data <- pull(o, data)
    strands <- strand(windows) %>% as.vector()
    for(i in seq_along(rowI))
    {
        if(strands[rowI[i]] == "-")
            ret[ , rowI[i]] <- rev(data[i][[1]])
        else
            ret[ , rowI[i]] <- data[i][[1]]
    }
    t(ret)
}