bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
194 stars 53 forks source link

Ensemble.sSNV.tsv does not have some vardict variant. #88

Closed jeongmeani closed 4 years ago

jeongmeani commented 4 years ago

Hi,

using somaticseq, i analysis WES data. i ran al l caller somaticseq support, and carried out merge.script.

but it makes some issue. the output 'Ensemble.sSNV.tsv ' does not match with vardict output.

for example

VarDict.vcf have 'chr1 | 3712588 | . | G | A'

but in Ensemble.sSNV.ts if_VarDict is '0'

was the variant filtered by script?

Best Regards

Jeongmin

litaifang commented 4 years ago

if_VarDict should be 1, if that variant has PASS and Somatic in the VarDict's vcf file.

jeongmeani commented 4 years ago

yes,

the variant info is chr1 3712588 . G A 103 PASS STATUS=StrongSomatic;SAMPLE=TUMOR;TYPE=SNV;DP=91;VD=12;AF=0.1319;SHIFT3=0;MSI=2.000;MSILEN=2;SSF=0.07143;SOR=0;LSEQ=AGGCTCTGCAGGCGCGGGGC;RSEQ=CAGCGCGCCAGGTCGGCTGG GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/1:91:12:5,7:37,42:79,12:0.1319:2,2:38:1:29:1:0.76749:1.23051:60:11:0.131:0.011:1.7 0/0:21:0:0,0:8,13:21,0:0:2,0:36.4:1:33.6:1:1:0:60:9.5:1:0:0.6

but Ensemble.sSNV.tsv has if_VarDict:0

Jeongmin

litaifang commented 4 years ago

You can send me some of the two files if you don't mind. Include the line in question, plus 10 lines before and after, and also include the headers. You may send them to li_tai.fang@roche.com.

jeongmeani commented 4 years ago

Hi,

i send you file via e-mail.

i think Vardict's 'bed_intersector' step has that problem.  because intersect.vardict.vcf has no variant chr1 3712588 i wrote on issues.

Thank you for your helping.

Jeongmin

litaifang commented 4 years ago

I figure out why. When VarDict outputs things like \<DUP>, \<DEL>, \<INV>, etc. in the VCF file, they do not have END=xxx field in INFO, which is required for bedtools because the END field tells bedtools where the region ends. So bedtools doesn't go into completion. Let me modify my codes to get around that issue.

litaifang commented 4 years ago

I incorporated the fix into the "latest" branch. Will move that into the main branch when I've tested it more extensively.

jeongmeani commented 4 years ago

oh, great! i hope it will work fine.

Thank you for your helping.

Jeongmin

litaifang commented 4 years ago

Fixed by including a routine to remove incompatible lines in VarDict's vcf files before using bedtools on it.