arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 120 forks source link

No read depth and genotype quality when using platypus #847

Closed ehitchcock closed 7 years ago

ehitchcock commented 7 years ago

Hello,

I am not able to filter variants based on depth or genotype quality. I used Platypus to call my variants. I saw in already posted issues that this problem is due to Gemini only supporting this information from GATK and FreeBayes.

I was wondering if you could please tell me how this information is supposed to be reported in my vcf file?

Thank you very much!

Below is the header of my vcf and a SNP from my vcf:

fileformat=VCFv4.0

FILTER=

fileDate=2017-05-11

source=Platypus_Version_0.8.1

platypusOptions={'assemblyRegionSize': 1500, 'trimReadFlank': 0, 'assembleBadReads': 1, 'bamFiles': ['CAU7EANXX_1_CGATGT_A1335.bam'], 'minVarDist': 9, 'trimSoftClipped': 1, 'minReads': 2, 'qualBinSize': 1, 'refFile': 'GRCh37-lite.fa', 'maxHaplotypes': 50, 'filterVarsByCoverage': 1, 'maxSize': 1500, 'originalMaxHaplotypes': 50, 'skipDifficultWindows': 0, 'parseNCBI': 0, 'skipRegionsFile': None, 'noCycles': 0, 'trimAdapter': 1, 'minPosterior': 5, 'assembleAll': 1, 'trimOverlapping': 1, 'filterDuplicates': 1, 'abThreshold': 0.001, 'minFlank': 10, 'bufferSize': 100000, 'fileCaching': 0, 'useEMLikelihoods': 0, 'coverageSamplingLevel': 30, 'calculateFlankScore': 0, 'logFileName': 'CAU7EANXX_1_CGATGT_A1335.platvarlog.txt', 'nCPU': 1, 'filterReadsWithUnmappedMates': 1, 'qdThreshold': 10, 'maxVariants': 8, 'scThreshold': 0.95, 'filterReadsWithDistantMates': 1, 'maxReads': 5000000, 'badReadsWindow': 11, 'genIndels': 1, 'largeWindows': 0, 'minMapQual': 20, 'maxVarDist': 15, 'maxGOF': 30, 'rlen': 150, 'minGoodQualBases': 20, 'refCallBlockSize': 1000, 'countOnlyExactIndelMatches': 0, 'longHaps': 0, 'HLATyping': 0, 'filterReadPairsWithSmallInserts': 1, 'minBaseQual': 20, 'getVariantsFromBAMs': 1, 'genSNPs': 1, 'assemble': 0, 'assemblerKmerSize': 15, 'minVarFreq': 0.05, 'alignScoreFile': '', 'verbosity': 2, 'sourceFile': None, 'compressReads': 0, 'rmsmqThreshold': 40, 'filteredReadsFrac': 0.7, 'outputRefCalls': 0, 'badReadsThreshold': 15, 'hapScoreThreshold': 4, 'regions': None, 'sbThreshold': 0.001, 'output': 'CAU7EANXX_1_CGATGT_A1335_Platypus.vcf', 'assembleBrokenPairs': 0, 'mergeClusteredVariants': 1, 'maxGenotypes': 1275, 'nInd': 1}

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

INFO=

INFO=

SnpEffVersion="4.3i (build 2016-12-15 22:33), by Pablo Cingolani"

SnpEffCmd="SnpEff -i vcf -o vcf GRCh37.75 /Users/ehitchcock/A1335/CAU7EANXX_1_CGATGT_A1335_Platypus_normalized.vcf "

INFO=

INFO=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

10 105128134 . T G 926.0 PASS BRF=0.5;FR=0.5;HP=3;HapScore=2;MGOF=5;MMLQ=34;MQ=58.73;NF=10;NR=24;PP=926;QD=28.258;SC=AGTGGCGGGCTCCGGAGCCCC;SbPval=0.53;Source=Platypus;TC=80;TCF=23;TCR=57;TR=34;WE=105128142;WS=105128124;ANN=G|missense_variant|MODERATE|TAF5|ENSG00000148835|transcript|ENST00000369839|protein_coding|1/11|c.388T>G|p.Ser130Ala|411/3268|388/2403|130/800||,G|missense_variant|MODERATE|TAF5|ENSG00000148835|transcript|ENST00000351396|protein_coding|1/10|c.388T>G|p.Ser130Ala|411/3099|388/2238|130/745|GT:GL:GOF:GQ:NR:NV 1/0:-96.65,0,-105.05:5:99:80:34

(cc: @Phillip-a-richmond)

brentp commented 7 years ago

as you've found, platypus is not supported. We support RO/AO (as in freebayes) or AD (as in GATK and the VCF spec) for REF, ALT depths. I'm not sure why GQ is not supported, but it does look like your header is missing the sample name (after "FORMAT")

ehitchcock commented 7 years ago

Okay thank you