apeltzer / ReportTable

The Report Generation Engine for the EAGER Pipeline
GNU General Public License v3.0
1 stars 1 forks source link

Endogenous DNA calculation for mtCapture mode incorrect. #8

Open jfy133 opened 8 years ago

jfy133 commented 8 years ago

Summary When running EAGER 1.92.9 (1.92.10 in Jena), and using mtCapture mode (+ CircularMapper + mapQ 37) on mtEnriched DNA, the resulting Endogenous DNA (%) column in ReportTable gives the incorrect value. It should give: On Target mtDNA Reads prior DeDup/Reads after C&M

Description and Example Currently, ReportTable takes the % Endogenous DNA from the '*realigned.bam.stats' file in 3-Mapper when mtCapture mode is on. However, this file gives previously the 'input' number of reads into mapping to the whole nu+mt genome, which is a different from the number of reads after C&M prior mapping given in ReportTable, leading to an erroneous % value.

For example Report Table gives: Sample: JK2760 No. reads after C&M: 3,617,343 No mapped reads prior RMDup: 399,317 Mapped Reads after RMDUP: 17907 Endogenous DNA (%): 15.024

The Endogenous DNA of 15% derives from this stats file that gives the following numbers: 2657863 + 0 in total (QC-passed reads + QC-failed reads) [...] 399317 + 0 mapped (15.02% : N/A)

As you can see, the number of 'input reads' into the Mapping in the .stats file is not the same as the one presented after C&M.

The correct endogenous DNA calculation should be (399,317/3,617,343)*100 = 11.03% (on target mt reads / input reads)

Solution When mtCapture mode is turned on, Endogenous DNA from mtCapture mode should be calculated at least as: reads mapping to MTCHR prior DeDup / reads after C&M prior mapping

Or preferably: reads mapping to MTCHR prior map quality filtering and DeDup / reads after C&M prior mapping

Additionally, the number of reads mapping to the whole genome (prior filtering for just mt reads) should be included as a new column.

Example Files EAGER config file EAGER log.log & execution.log Stats log from Clip&Merge and Mapping output folders ReportTable of corresponding run

mtcap_EndoCalc_Issue.zip

sc13-bioinf commented 8 years ago

EAGER wrongly uses the stats file from the realigned output: (This is after mapping)

output/Sample_JK2760/3-Mapper/JK2760_TTCTGAATGGCGGT_L008_R1_001.fastq.merged.fq.MT_realigned.bam.stats

I have added a CircularMapper pipeline, still need to decide how to use it to avoid calculation of the endgogenous DNA from the realigned stats.