Bug in seq_to_character function

JasperYH commented 4 years ago

The abundance_threshold and alignment_score_threshold filter doesn't work. I'm not sure why.

JasperYH commented 4 years ago

input_file <- "LIBILCM_9.txt"
name <- strsplit(input_file,"\\.")[[1]][1]
  out_file <- paste("./cache/traceQC_obj/",name,".rds",sep="")
  if (!file.exists(out_file)) {
    print(name)
    obj <- create_TraceQC_object(aligned_reads_file = paste("./cache/alignment/",input_file,sep=""),
                                 ref_file = ref_file,
                                 fastqc_file = paste("./qc_report/",name,"_fastqc.zip",sep=""),
                                 alignment_score_threshold = 200,
                                 abundance_threshold = 10,
                                 ncores = 1)
    saveRDS(obj,file=out_file)

hyunhwan-jeong commented 4 years ago

Hi @JasperYH, I ran the script (with small modifications), and I did not see any error.

setwd("/mnt/hdd/jasper/CRISPR_barcode/Shawn3")
input_file <- "LIBILCM_9.txt"
ref_file <- "/home/jasper/hdd/CRISPR_barcode/Shawn3/data/ref21.txt"
name <- strsplit(input_file,"\\.")[[1]][1]
print(name)
obj <- TraceQC::create_TraceQC_object(aligned_reads_file = paste("./cache/alignment/",input_file,sep=""),
                           ref_file = ref_file,
                           fastqc_file = paste("./qc_report/",name,"_fastqc.zip",sep=""),
                           alignment_score_threshold = 200,
                           abundance_threshold = 10,
                           ncores = 1)
saveRDS(obj,file=out_file)

Can you try it once you reinstall the package by devtools::install_github("https://github.com/LiuzLab/TraceQC")?

JasperYH commented 4 years ago

Can you check the output file to see if the filter works?

hyunhwan-jeong commented 4 years ago

I don't think so

> max(obj$aligned_reads$score)
[1] 293.4
> min(obj$aligned_reads$score)
[1] -29.6

hyunhwan-jeong commented 4 years ago

Just checked obj$mutation, but it seems the filter is not working

> summary(obj$mutation)
                                                                                                                                       target_seq  
 AACTTGAAAG------------------------ATC---------GGAAGAGCACACGTCTGAACTCCAGT------CACCGCTCA------TTATCGCGTATGCCGTCTTCTGCTTGAA----------AAAATAAG:  17  
 AACTTGAAAG------------------------ATC---------GGAAGAGCACACGTCTGAACTCCAGT------CACCGCTCA------TTATCTCGTATGCCGTCTTCTGCTTGAAAAA--AA-----------:  17  
 AACTTGAAAG------------------------ATC---------GGAAGAGCACACGTCTGAACTCCAGT------CACCGCTCA------TTATCTCGTATGCCGTCTTCTGCTTGAAAAA-----------AAAG:  17  
 AACTTGAAAG------------------------ATC---------GGAAGAGCACACGTCTGAACTCCAGT------CACCGCTCA------TTATCTCGTATGCCGTCTTCTGCTTGAA----------AAAATAAG:  17  
 AACTTGAAAG------------------------ATC---------GGAAGAGCACACGTCTGAACTCCAGT------CACCGCTCA------TTATCACGTATGCCGTCTTCTGCTTGAA---------------AAA:  16  
 AACTTGAAAG------------------------ATC---------GGAAGAGCACACGTCTGAACTCCAGT------CACCGCTCA------TTATCGCGTATGCCGTCTTCTGCTTGAAAAA---------------:  16  
 (Other)                                                                                                                                    :4651  
 alignment_score        type          start            length                mutate_to        count        
 Min.   : 60.9   unmutated:   1   Min.   :  0.00   Min.   : 0.000   G             :1070   Min.   :     24  
 1st Qu.: 79.0   deletion : 996   1st Qu.: 61.00   1st Qu.: 1.000   -             : 997   1st Qu.:    322  
 Median :246.8   mutation :2735   Median : 63.00   Median : 1.000   A             : 836   Median :    614  
 Mean   :205.2   insertion:1019   Mean   : 64.96   Mean   : 6.741   C             : 530   Mean   :   5889  
 3rd Qu.:264.5                    3rd Qu.: 83.00   3rd Qu.:14.000   AGACG         : 471   3rd Qu.:   1989  
 Max.   :293.4                    Max.   :112.00   Max.   :97.000   AGCAAACGTTTGGA: 356   Max.   :1923186  
                                                                    (Other)       : 491

JasperYH commented 4 years ago

When I run the code directly (not using the function), it works well.

hyunhwan-jeong commented 4 years ago

Can you share your output as the same as I did?

hyunhwan-jeong commented 4 years ago

I added some print() for a debug purpose (not commit this modification) as follows:

  print(summary(aligned_reads))
  print(nrow(aligned_reads))
  print(aligned_reads %>% filter(score>alignment_score_threshold) %>% nrow())
  aligned_reads <- aligned_reads %>%
    filter(score>alignment_score_threshold) %>%
    group_by(target_seq,target_ref) %>%
    summarise(count=n(),score=max(score)) %>%
    ungroup

  print(sum(aligned_reads$count))

Here is the output:

     name               seq                ref                score        target_ref         target_seq         spacer_ref       
 Length:7770926     Length:7770926     Length:7770926     Min.   :-29.6   Length:7770926     Length:7770926     Length:7770926    
 Class :character   Class :character   Class :character   1st Qu.:238.8   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Median :250.8   Mode  :character   Mode  :character   Mode  :character  
                                                          Mean   :213.9                                                           
                                                          3rd Qu.:264.5                                                           
                                                          Max.   :293.4                                                           
  spacer_seq          PAM_ref            PAM_seq         
 Length:7770926     Length:7770926     Length:7770926    
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  

[1] 7770926
[1] 7758082
[1] 7758082
[1] 6805198

hyunhwan-jeong commented 4 years ago

I finally found why. The augments were swapped.

In TraceQC.R:

https://github.com/LiuzLab/TraceQC/blob/bff209b74fbe4bc870e1b3f41054ec98fe95427c/R/TraceQC.R#L110-L111

In mutation_events.R:

https://github.com/LiuzLab/TraceQC/blob/bff209b74fbe4bc870e1b3f41054ec98fe95427c/R/mutation_events.R#L80-L83

It has been solved, and I will commit the fix: Let's place alignment_score_threshold first.

LiuzLab / TraceQC

Bug in seq_to_character function #21