dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
444 stars 137 forks source link

Reason for filtering #204

Closed tyyiyi closed 4 years ago

tyyiyi commented 4 years ago

I am using delly0.8.1 to call SV. My sample is a pair of normal tumor tissues with a depth of about 1000x. This is my script: $delly call -g $ref -x $excl -o $dellyout/${tumor}${normal}.bcf $tdata/$tumor.MarkDuplicates.bam $ndata/$normal.MarkDuplicates.bam $delly filter -f somatic -a 0.05 -o $dellyout/${tumor}${normal}.pre.bcf -s $tsv/8nt.tsv $dellyout/${tumor}_${normal}.bcf

There is a mutation in the output file obtained before delly filter:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

chr3 41224321 DEL00003778 A . PASS PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr3;END=41224930;PE=888;MAPQ=60;CT=3to5;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=60;INSLEN=0;HOMLEN=2;SR=20;SRQ=1;CONSENSUS=TCCATTTTCTGCTCACTCCTCCTAATGGCTTGGTGAAATAGCAAACAAGCCACCAGCAGGAATCTAGTCTGGATGACTGCTTCTGGAGCCTGGATGCAGTACCATTCTTCCACTGATTCAGAGTGTTGAATTAACCTTTTCCAGATATTGATGGACA;CE=1.98657 GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV 0/1:-1000,0,-1000:10000:PASS:2392:4181:3824:1:2079:1467:1556:1476 0/0:0,-887.203,-1000:10000:PASS:2285:7518:3595:3:3957:0:2982:3

But this mutation was filtered out, however, this deletion was verified to be correct. Therefore, why is it filtered out? How is the parameter -a calculated here?

And, can you give me some suggestions on filtering? Should I perform the delly filter step?

Next, I plan to perform structural variation detection on cfDNA samples paired with normal tumor samples. How should their filtering be set?

Thank you Tang

tyyiyi commented 4 years ago

Is the parameter -a 0.05 suitable for tumor samples? Or should the tumor sample and the normal sample be greater than 0.05 to pass?

tobiasrausch commented 4 years ago

If you plan to detect sub-clonal SVs or SVs in cfDNA then these often occur at extremely low allele frequency. Because of that, you apparently take a deep sequencing approach and you are right -a is the allele frequency cutoff. The allele frequency is calculated as RV/(RR+RV) for INFO/PRECISE variants. In your above example the allele frequency is very high, almost ~50%, and the reason for filtering was that your control sample also supports it, albeit at much lower allele frequency 3/(2982+3). The problem you face is that your control is probably slightly contaminated with tumor and thus you need to set the '-c' parameter to something like 0.01 to allow for this tumor-in-normal contamination.

tyyiyi commented 4 years ago

Thanks a lot ,I will try again.

Tang