dkoboldt / varscan

Variant calling and somatic mutation/CNV detection for next-generation sequencing data
152 stars 34 forks source link

sanity error when running bcftools stats on varscan data #25

Open splaisan opened 6 years ago

splaisan commented 6 years ago

Dear, I ran varscan mpileup2cns (v2.4.3) on some data and when analysing the vcf with bcftools stats got an error

$ bcftools stats --fasta-ref   Ref.fasta variants.vcf.gz  > vcf.stats
Sanity check failed, the reference sequence differs: NC_030973.1:129900+2 .. G vs Y
samtools faidx R64_SEUB3.0_merged.fasta NC_030973.1:129900-129902

After inspecting this closely (https://github.com/samtools/bcftools/issues/691#issuecomment-337218492) @pd3 came to the conclusion that varscan is not handling IUPAC ref bases as it should and should have made CAC out of the reference CAY bases instead of the returned CAG.

Could you please confirm and maybe fix this IUPAC replacement case for the ref field? Thanks in advance

## reference info
samtools faidx R64_SEUB3.0_merged.fasta NC_030973.1:129900-129902
NC_030973.1:129900-129902
CAY

$ tabix variants.vcf.gz NC_030973.1:129900-129902
NC_030973.1 129900  .   CAG C   .   PASS    ADP=22;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR   0/1:49:33:22:10:13:48.15%:1.1242E-5:23:12:6:4:3:10