cnobles / iGUIDE

Bioinformatic pipeline for identifying dsDNA breaks by marker based incorporation, such as breaks induced by designer nucleases like Cas9.
https://iguide.readthedocs.io/en/latest/
GNU General Public License v3.0
20 stars 9 forks source link

Negative values in genes of interest Fishers test #65

Closed helixscript closed 4 years ago

helixscript commented 4 years ago

There are instances where there are more special genes in the query than in the reference.

tools/rscripts/evaluate_incorp_data.R L1495

ref <- enrich_df[1, c("total", "special"), drop = TRUE]
query <- enrich_df[i, c("total", "special"), drop = TRUE]
ref$diff <- abs(diff(as.numeric(ref)))
query$diff <- abs(diff(as.numeric(query)))
 mat <- matrix(
     c(
        ref$total - ref$special - query$total + query$special,
       query$diff, ref$special - query$special, query$special
    ),
     nrow = 2
 )
> fisher.test(mat)$p.value
Error in fisher.test(mat) : 
  all entries of 'x' must be nonnegative and finite

> mat
      [,1] [,2]
[1,] 27060   -3
[2,] 15747   40

> ref
$total
[1] 42844

$special
[1] 37

$diff
[1] 42807

> query
$total
[1] 157
$special
[1] 40

$diff
[1] 15747

ref$special - query$special
[1] -3
cnobles commented 4 years ago

I believe the error arises due to a flaw in the logic behind comparing the paired and matched sites to the reference (at least that's one part). Here we are trying to compare if there are more onco or special genes enriched in the dataset, but while the reference was focusing on unique genes, the query could have repeated genes from either incorporation occurring in multiple areas within the gene or near / within the gene. This has been corrected in commit 49394263bbdc5f2a9020c006177fef251dc1b373.

Additionally, I reviewed the Fisher's Exact test logic and came to the conclusion that it had been overthought. It is now a simpler test that follows what is described in the documentation (testing for gene set enrichment compared to the reference genome).

With these fixes, I've moved the versioning up to 1.0.1. Let me know if this update resolves your issues @everettJK.