Closed pascalg closed 9 years ago
It looks like the "AF" (allele frequency, e.g. 0.5 or 1) INFO field is not emitted by VarScan, so I'll have to modify this part of CNVkit to handle it.
Could you post a couple of lines from your VarScan VCF file here? I don't need the whole thing, just a few lines and ideally the header too, if you can.
Thanks, here it is:
##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
CHROM,FROM,REF,ALT,-,-,-,FILTER,-
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
1 69511 . A G . PASS DP=50;SS=1;SSC=0;GPV=9.9117E-30;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:20:0:20:100%:0,0,15,5 1/1:.:30:0:30:100%:0,0,23,7
1 139213 . A G . PASS DP=19;SS=1;SSC=5;GPV=5.9414E-9;SPV=2.6316E-1 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:9:0:9:100%:0,0,3,6 1/1:.:10:2:8:80%:1,1,1,7
1 762273 . G A . PASS DP=109;SS=1;SSC=0;GPV=4.3979E-65;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:54:0:54:100%:0,0,8,46 1/1:.:55:0:55:100%:0,0,9,46
I've just merged a pull request (#11) that should fix this. Can you try it now?
Thanks, it is working now. However, I am getting different chromosomes with significant LOH shift when I use "scatter" or "loh" with the same VCF file. Shouldn't the result be the same? From the plot I think "scatter" is wrong as I cannot see why it marks the chrosomome significant (all VAFs <0.8).
I agree with your assessment of it. I'll take a closer look.
The loh
command had an option --min-depth
which filtered out SNVs with depth < 20, while scatter
was missing this option and did no filtering. Fixed in: bbe9af2c16410693dd71d946806388044e7801d6
The deeper problem was that the test statistic I used was garbage, so I disabled it here: b339ca1d80ac0a07424dbbd3191f1d70a957afa9
I will at some point implement a better test for LOH and then re-enable reporting and colorization in both plots.
The LOH command is not working for me, it doesn't seem to like my VarScan VCF file. Any ideas?
Thanks