djhn75 / RNAEditor

14 stars 15 forks source link

Remove Duplicates #16

Closed WYSNI closed 6 years ago

WYSNI commented 6 years ago

Hi David,

I have a question about the "Remove PCR duplicates" step:

In your code, (see below) i think you just mark duplicates reads or not remove them because you just use the picard tool MarkDuplicates. The step rmdup is in comment .

I'm wrong?

mark PCR duplicates

    #Helper.status("Remove Duplicates", self.rnaEdit.logFile,self.rnaEdit.textField)
    markedFile=self.rnaEdit.params.output+".noDup.bam"
    cmd=["java","-Xmx16G","-jar",self.rnaEdit.params.sourceDir + "picard-tools/MarkDuplicates.jar","INPUT=" + bamFile, "OUTPUT=" + markedFile, "METRICS_FILE="+self.rnaEdit.params.output+".pcr.metrics", "VALIDATION_STRINGENCY=LENIENT", "CREATE_INDEX=true"]
    Helper.proceedCommand("Remove PCR duplicates", cmd, bamFile, markedFile, self.rnaEdit)

    """if self.rnaEdit.params.paired == False:
        pysam.rmdup("-s",bamFile,markedFile)
    else:
        pysam.rmdup(bamFile,markedFile)
    #pysam.rmdup(bamFile,markedFile)
    if self.rnaEdit.params.paired == False:
        cmd = [self.rnaEdit.params.sourceDir + "samtools", "rmdup", "-s", bamFile, markedFile]
    else:
        cmd = [self.rnaEdit.params.sourceDir + "samtools", "rmdup", bamFile, markedFile]
    Helper.proceedCommand("Index Bam File", cmd, bamFile, markedFile, self.rnaEdit)

    Helper.status("index Bam", self.rnaEdit.logFile,self.rnaEdit.textField)
    pysam.index(markedFile)

    cmd = [self.rnaEdit.params.sourceDir + "samtools", "index", bamFile]
    Helper.proceedCommand("Index Bam File", cmd, bamFile, markedFile+".bai", self.rnaEdit)
    #return bamFile"""

Where did you remove duplicates reads?

Best,

WYSNI

djhn75 commented 6 years ago

Dear WYSNI, you are totally right, the duplicates are simply marked, yet afterward those marked reads are ignored by GATK. Therefore duplicate reads cannot influence the SNP ratio.

Here the code snippet:

#mark PCR duplicates
        #Helper.status("Remove Duplicates", self.rnaEdit.logFile,self.rnaEdit.textField)
        markedFile=self.rnaEdit.params.output+".noDup.bam"
        cmd=["java","-Xmx16G","-jar",self.rnaEdit.params.sourceDir + "picard-tools/MarkDuplicates.jar","INPUT=" + bamFile, "OUTPUT=" + markedFile, "METRICS_FILE="+self.rnaEdit.params.output+".pcr.metrics", "VALIDATION_STRINGENCY=LENIENT", "CREATE_INDEX=true"]
        Helper.proceedCommand("Remove PCR duplicates", cmd, bamFile, markedFile, self.rnaEdit)

We tested samtools rmdup too, therefore it is still in the code, but we recognised that GATK often crashes after we used samtools, because of incorrect formed files.

Thanks for your interest in RNAEditor

Best regards David David John

WYSNI commented 6 years ago

Thanks a lot for your answer.

Best WYSNI