gagneurlab / FRASER

FRASER - Find RAre Splicing Events in RNA-seq
MIT License
36 stars 21 forks source link

Error: checkIntergenic on unlocalized and unplaced sequences #74

Closed jakobjn closed 5 months ago

jakobjn commented 5 months ago

When using comprehensive gene annotations, unlocalized and unplaced sequences result in an error when attempting to annotate potential impact of splice junctions with annotatePotentialImpact.

Debugging

  1. The error arises since no genes in the txdb are located on unlocalized chr*_random and unplaced chrUn_* sequences.
  2. This causes min(distance(test_junction, refseq.genes), na.rm = TRUE) to evaluate to Inf. https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L817
  3. distance(test_junction, refseq.genes) in turn evaluates to a vector of exclusively NA's.
  4. min(c(NA, NA, ...), na.rm=TRUE) is equivalent to min(numeric(0)), which evaluates to Inf.
  5. As Inf is greater than 0, the conditional dist > 0 evaluates to TRUE, whereas the code block under the if-statement is erroneously executed. https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L818-L830

Fix

This error can be resolved by modifying the condition of the if-statement from dist > 0 to is.finite(dist) && dist > 0. https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L818

Error

1. annotatePotentialImpact(result = res_junc_dt, txdb = txdb, fds = fds)
6. addPotentialImpactLabels(annoResult, fds, txdb)
8. sapply(noLaps, function(i) {
 .     overlap <- to(findOverlaps(junctions_gr[i], exons))
 .     if (length(overlap) == 0) {
 .         return(checkIntergenic(junctions_gr, i, refseq.genes))
 .     }
 .     for (j in overlap) {
 .         exon_start = start(exons[j])
 .         exon_end = end(exons[j])
 .         if (exon_start <= start(junctions_gr[i]) & exon_end >= 
 .             end(junctions_gr[i])) {
 .             if ((end(junctions_gr[i]) - start(junctions_gr[i]) + 
 .                 1)%%3 != 0) {
 .                 frs = "likely"
 .             }
 .             else {
 .                 frs = "unlikely"
 .             }
 .             return(c("exonTruncation", frs))
 .         }
 .     }
 .     return(c("complex", "inconclusive"))
 . })
9. lapply(X = X, FUN = FUN, ...)
10. FUN(X[[i]], ...)
11. checkIntergenic(junctions_gr, i, refseq.genes)
12. start(refseq.genes[nearest(junctions_gr[i], refseq.genes)])
13. refseq.genes[nearest(junctions_gr[i], refseq.genes)]
14. refseq.genes[nearest(junctions_gr[i], refseq.genes)]
15. subset_along_ROWS(x, i, drop = drop)
16. extractROWS(x, i)
17. extractROWS(x, i)
18. normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
19. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
  .     allow.NAs = allow.NAs)
20. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
  .     allow.NAs = allow.NAs)
21. .subscript_error("subscript contains NAs")
22. stop(wmsg(...), call. = FALSE)
23. .handleSimpleError(function (cond) 
  . .Internal(C_tryCatchHelper(addr, 1L, cond)), "subscript contains NAs", 
  .     base::quote(NULL))
24. h(simpleError(msg, call))
ischeller commented 5 months ago

Hi @jakobjn , thanks for reporting this! I will look into this and come back to you. Can you add which FRASER / R / Bionconductor version you are using and provide an example of a txdb object that results in the error?

jakobjn commented 5 months ago

@ischeller I'm using FRASER 1.99.3 (1b0cd9581ba1b1ae2bf329821c3a5823ef6505b7) as a Conda package built from this custom recipe. This is to exclusively utilize Conda for package management.

c-mertes commented 5 months ago

Thanks for the report on this. I merged it now. Please let us know if you still encounter issues.