Closed WYSNI closed 6 years ago
Dear WYSNI, you are totally right, the duplicates are simply marked, yet afterward those marked reads are ignored by GATK. Therefore duplicate reads cannot influence the SNP ratio.
Here the code snippet:
#mark PCR duplicates
#Helper.status("Remove Duplicates", self.rnaEdit.logFile,self.rnaEdit.textField)
markedFile=self.rnaEdit.params.output+".noDup.bam"
cmd=["java","-Xmx16G","-jar",self.rnaEdit.params.sourceDir + "picard-tools/MarkDuplicates.jar","INPUT=" + bamFile, "OUTPUT=" + markedFile, "METRICS_FILE="+self.rnaEdit.params.output+".pcr.metrics", "VALIDATION_STRINGENCY=LENIENT", "CREATE_INDEX=true"]
Helper.proceedCommand("Remove PCR duplicates", cmd, bamFile, markedFile, self.rnaEdit)
We tested samtools rmdup too, therefore it is still in the code, but we recognised that GATK often crashes after we used samtools, because of incorrect formed files.
Thanks for your interest in RNAEditor
Best regards David David John
Thanks a lot for your answer.
Best WYSNI
Hi David,
I have a question about the "Remove PCR duplicates" step:
In your code, (see below) i think you just mark duplicates reads or not remove them because you just use the picard tool MarkDuplicates. The step rmdup is in comment .
I'm wrong?
mark PCR duplicates
Where did you remove duplicates reads?
Best,
WYSNI