Closed maegsul closed 4 years ago
Hi,
That's correct. The MarkDuplicates step is included to provide this annotation in the BAM files, but duplicates are not excluded from downstream quantifications in the core pipeline. This is due to ambiguities in resolving the source of duplicates, which can be biological or technical (see for example here).
Thank you very much for the clarification François!
Hi, first of all thanks a lot for this incredibly useful repository.
I am following GTEx v9 (using the branch, because v8 run_rnaseqc.py gives me 0 counts for all exons/transcripts/genes) pipeline for expression quantification and eQTL mapping.
To see the effect of Mark Duplicates step on quantification, I took an original bam file and its version processed by Mark Duplicates (that keeps all the reads, but changes only the second column of the bam file [flag] as far as I know).
Then, I run "run_rnaseqc.py" on both the original bam file and the output file of Mark Duplicates. When I compare the output, they seem to be identical.
Is this expected? MarkDuplicates step is not meant to affect expression quantification and later steps? If so, what is its functionality?
Thanks!