Roleren / ORFik

MIT License
33 stars 9 forks source link

reason for length(extension) == 1L condition in extendLeaders and extendTrailers #99

Closed m-swirski closed 4 years ago

m-swirski commented 4 years ago

Hi, I was wondering, why restrict extension in these function to a single integer (or a grangeslist)? These seem to work fine with an integer vector with length(extension) == length(grl) And in some cases it's desirable to extend each grl object with a different value;

library(ORFik) attach(loadNamespace("ORFik"))

the only difference is in the first if statement

extendLeaders_multi <- function (grl, extension = 1000L, cds = NULL) { if (is(extension, "numeric") && length(extension) %in% c(1L, length(grl))) { posIndices <- strandBool(grl) promo <- promoters(unlist(firstExonPerGroup(grl), use.names = FALSE), upstream = extension) newStarts <- rep(NA, length(grl)) newStarts[posIndices] <- as.integer(start(promo[posIndices])) newStarts[!posIndices] <- as.integer(end(promo[!posIndices])) } else if (is.grl(class(grl))) { starts <- startSites(extension) changedGRL <- downstreamFromPerGroup(grl[names(extension)], starts) return(changedGRL) } else { stop("extension must either be an integer, or a GRangesList") } extendedLeaders <- assignFirstExonsStartSite(grl, newStarts) if (is.null(cds)) return(extendedLeaders) return(addCdsOnLeaderEnds(extendedLeaders, cds)) }

extendTrailers_multi <- function (grl, extension = 1000L) { if (is(extension, "numeric") && length(extension) %in% c(1L, length(grl))) { posIndices <- strandBool(grl) promo <- flank(unlist(lastExonPerGroup(grl), use.names = FALSE), width = extension, start = FALSE) newEnds <- rep(NA, length(grl)) newEnds[posIndices] <- as.integer(end(promo[posIndices])) newEnds[!posIndices] <- as.integer(start(promo[!posIndices])) } else if (is.grl(class(grl))) { starts <- startSites(extension) changedGRL <- upstreamOfPerGroup(grl[names(extension)], starts, allowOutside = TRUE) return(changedGRL) } else { stop("extension must either be an integer, or a GRangesList") } return(assignLastExonsStopSite(grl, newEnds)) }

gtf_file <- system.file("extdata", "annotations.gtf", package = "ORFik") txdb <- loadTxdb(gtf_file) tx <- exonsBy(txdb, by = "tx", use.names = TRUE) tx[1:3] extendLeaders_multi(tx[1:3], c (10,20,30)) extendTrailers_multi(tx[1:3], c (10,20,30))

So it seems to work without any problems...

Roleren commented 4 years ago

I thought about this point a lot, and I did it because this made me code better, less error when I put in the wrong integers, but will think about it a bit more. I see the point in having the option for full length of integers.

m-swirski commented 4 years ago

I think that these functions (startRegion etc too, but I didn't check how easy would it be to modify them) should work analogously to intra-range-methods from genomic ranges, etc, to make it consistent with bioconductor. And I really find it useful to have an option of providing whole vector of values rather than just one number.

Roleren commented 4 years ago

Ok, good argument, will add it.

Roleren commented 4 years ago

Updated extendLeaders and extendTrailers to work for length(grl).

It is possible to update startRegion or more precisely windowPerGroup, but can check out that later if it is needed. Will close issue.