Open Acetyl-ZHOU opened 9 months ago
Hi, it seems that there is something wrong with one of your vcf files or ids from the RNA BAM files. In which step exactly did this happen?
I want run snakemake --cores all sampleAnnotation mae
and get that problem.
My VCF and bam files also hg19.
this is my config.yaml
`projectTitle: "Detection of RNA Outliers Pipeline"
root: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/Output # root directory of all output objects and tables
htmlOutputPath: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/Output/html # path for HTML rendered reports
indexWithFolderName: true # whether the root base name should be part of the index name
hpoFile: null # if null, downloads it from webserver sampleAnnotation: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/sampleAnnotation.tsv # path to sample annotation (see documentation on how to create it)
geneAnnotation: v45: /Users/huizhou/Documents/1_allelic/drop_demo/gencode.v45lift37.basic.annotation.gtf genomeAssembly: hg19 genome: /Users/huizhou/Documents/1_allelic/drop_demo/hg19.fa
exportCounts:
# groups to exclude when exporting counts
geneAnnotations:
- v45
excludeGroups:
- null
aberrantExpression: run: false groups:
aberrantSplicing: run: false groups:
FRASER_version: "FRASER" deltaPsiCutoff : 0.3 quantileForFiltering: 0.95
mae: run: true groups:
qcVcf: /Users/huizhou/Documents/1_allelic/drop_demo/qc_vcf_1000G_hg19.vcf.gz qcGroups:
rnaVariantCalling: run: true groups:
tools: gatkCmd: gatk bcftoolsCmd: bcftools samtoolsCmd: samtools`
I also find some problem with this.
[W::vcf_parse_filter] FILTER 'VarFreq,VarMapQual,MinMMQSdiff' is not defined in the header [E::bcf_hdr_parse_line] Could not parse the header line: "##FILTER=<ID=VarFreq,VarMapQual,MinMMQSdiff,Description=\"Dummy\">" [E::vcf_parse_filter] Could not add dummy header for FILTER 'VarFreq,VarMapQual,MinMMQSdiff' at chr1:10048 [W::vcf_parse_filter] FILTER 'NoReadCounts' is not defined in the header [W::vcf_parse_filter] FILTER 'MinMMQSdiff' is not defined in the header
what I can do with header?
Hi, be sure to double-check your vcf files format, for example:
In my DNA VCF file the header is like this.
##INFO=<ID=ADP,Number=1,Type=Integer,Description="Average per-sample depth of bases wit h Phred score >= 15"> 4 ##INFO=<ID=WT,Number=1,Type=Integer,Description="Number of samples called reference (wi ld-type)"> 5 ##INFO=<ID=HET,Number=1,Type=Integer,Description="Number of samples called heterozygous -variant"> 6 ##INFO=<ID=HOM,Number=1,Type=Integer,Description="Number of samples called homozygous-v ariant"> 7 ##INFO=<ID=NC,Number=1,Type=Integer,Description="Number of samples not called"> 8 ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting re ads on one strand"> 9 ##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this positio n"> 10 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> 11 ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> 12 ##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Raw Read Depth as reported by SAMto ols"> 13 ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Quality Read Depth of bases with Phr ed score >= 15"> 14 ##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)"> 15 ##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (r eads2)"> 16 ##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency"> 17 ##FORMAT=<ID=PVAL,Number=1,Type=String,Description="P-value from Fisher's Exact Test" 24 ##FILTER=<ID=VarCount,Description="Fewer than 4 variant-supporting reads"> 25 ##FILTER=<ID=VarFreq,Description="Variant allele frequency below 0.05"> 26 ##FILTER=<ID=VarReadPos,Description="Relative average read position < 0.01"> 27 ##FILTER=<ID=VarDist3,Description="Average distance to effective 3' end < 0.01"> 28 ##FILTER=<ID=VarMMQS,Description="Average mismatch quality sum for variant reads > 100" > 29 ##FILTER=<ID=VarMapQual,Description="Average mapping quality of variant reads < 15"> 30 ##FILTER=<ID=VarBaseQual,Description="Average base quality of variant reads < 28"> 31 ##FILTER=<ID=Strand,Description="Strand representation of variant reads < 0.01"> 32 ##FILTER=<ID=RefMapQual,Description="Average mapping quality of reference reads < 15"> 33 ##FILTER=<ID=RefBaseQual,Description="Average base quality of reference reads < 28"> 34 ##FILTER=<ID=MMQSdiff,Description="Mismatch quality sum difference (ref - var) > 50"> 35 ##FILTER=<ID=MapQualDiff,Description="Mapping quality difference (ref - var) > 50"> 36 ##FILTER=<ID=ReadLenDiff,Description="Average supporting read length difference (ref - var) > 0.25">
do I need to add this information to sample annotaion? Or change vcf file?
it seems that MinMMQSdiff is not defined in the VCF file header. You would need to modify your VCF files.
So can I just write something for MinMMQSdiff? I don`t know how to add this to my Vcf.
consider validating your vcf files beforehand using for example the validation on only VCF format tests from GATK: https://gatk.broadinstitute.org/hc/en-us/articles/360037057272-ValidateVariants
Hey @Acetyl-ZHOU, any updates on the issue?
I run the mae,but have some problem.
[W::hts_idx_load2] The index file is older than the data file: /Users/huizhou/Documents/1_allelic/drop_demo/qc_vcf_1000G_hg19.vcf.gz.tbi Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: invalid class “ScanVcfParam” object: ScanVcfParam: 'geno' cannot be specified if 'samples' is 'NA' Execution halted
can you give me some help?