gagneurlab / drop

Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders
MIT License
130 stars 43 forks source link

error in mae #518

Open Acetyl-ZHOU opened 7 months ago

Acetyl-ZHOU commented 7 months ago

I run the mae,but have some problem. [W::hts_idx_load2] The index file is older than the data file: /Users/huizhou/Documents/1_allelic/drop_demo/qc_vcf_1000G_hg19.vcf.gz.tbi Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: invalid class “ScanVcfParam” object: ScanVcfParam: 'geno' cannot be specified if 'samples' is 'NA' Execution halted can you give me some help?

vyepez88 commented 7 months ago

Hi, it seems that there is something wrong with one of your vcf files or ids from the RNA BAM files. In which step exactly did this happen?

Acetyl-ZHOU commented 7 months ago

I want run snakemake --cores all sampleAnnotation mae and get that problem. My VCF and bam files also hg19. this is my config.yaml `projectTitle: "Detection of RNA Outliers Pipeline" root: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/Output # root directory of all output objects and tables htmlOutputPath: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/Output/html # path for HTML rendered reports indexWithFolderName: true # whether the root base name should be part of the index name

hpoFile: null # if null, downloads it from webserver sampleAnnotation: /Users/huizhou/Documents/1_allelic/drop_demo/RNAdata/sampleAnnotation.tsv # path to sample annotation (see documentation on how to create it)

geneAnnotation: v45: /Users/huizhou/Documents/1_allelic/drop_demo/gencode.v45lift37.basic.annotation.gtf genomeAssembly: hg19 genome: /Users/huizhou/Documents/1_allelic/drop_demo/hg19.fa

exportCounts:

specify which gene annotations to include and which

# groups to exclude when exporting counts
geneAnnotations:
    - v45
excludeGroups:
    - null

aberrantExpression: run: false groups:

aberrantSplicing: run: false groups:

mae: run: true groups:

rnaVariantCalling: run: true groups:

tools: gatkCmd: gatk bcftoolsCmd: bcftools samtoolsCmd: samtools`

Acetyl-ZHOU commented 7 months ago

I also find some problem with this. [W::vcf_parse_filter] FILTER 'VarFreq,VarMapQual,MinMMQSdiff' is not defined in the header [E::bcf_hdr_parse_line] Could not parse the header line: "##FILTER=<ID=VarFreq,VarMapQual,MinMMQSdiff,Description=\"Dummy\">" [E::vcf_parse_filter] Could not add dummy header for FILTER 'VarFreq,VarMapQual,MinMMQSdiff' at chr1:10048 [W::vcf_parse_filter] FILTER 'NoReadCounts' is not defined in the header [W::vcf_parse_filter] FILTER 'MinMMQSdiff' is not defined in the header

what I can do with header?

vyepez88 commented 7 months ago

Hi, be sure to double-check your vcf files format, for example:

Acetyl-ZHOU commented 7 months ago

In my DNA VCF file the header is like this. ##INFO=<ID=ADP,Number=1,Type=Integer,Description="Average per-sample depth of bases wit h Phred score >= 15"> 4 ##INFO=<ID=WT,Number=1,Type=Integer,Description="Number of samples called reference (wi ld-type)"> 5 ##INFO=<ID=HET,Number=1,Type=Integer,Description="Number of samples called heterozygous -variant"> 6 ##INFO=<ID=HOM,Number=1,Type=Integer,Description="Number of samples called homozygous-v ariant"> 7 ##INFO=<ID=NC,Number=1,Type=Integer,Description="Number of samples not called"> 8 ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting re ads on one strand"> 9 ##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this positio n"> 10 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> 11 ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> 12 ##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Raw Read Depth as reported by SAMto ols"> 13 ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Quality Read Depth of bases with Phr ed score >= 15"> 14 ##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)"> 15 ##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (r eads2)"> 16 ##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency"> 17 ##FORMAT=<ID=PVAL,Number=1,Type=String,Description="P-value from Fisher's Exact Test" 24 ##FILTER=<ID=VarCount,Description="Fewer than 4 variant-supporting reads"> 25 ##FILTER=<ID=VarFreq,Description="Variant allele frequency below 0.05"> 26 ##FILTER=<ID=VarReadPos,Description="Relative average read position < 0.01"> 27 ##FILTER=<ID=VarDist3,Description="Average distance to effective 3' end < 0.01"> 28 ##FILTER=<ID=VarMMQS,Description="Average mismatch quality sum for variant reads > 100" > 29 ##FILTER=<ID=VarMapQual,Description="Average mapping quality of variant reads < 15"> 30 ##FILTER=<ID=VarBaseQual,Description="Average base quality of variant reads < 28"> 31 ##FILTER=<ID=Strand,Description="Strand representation of variant reads < 0.01"> 32 ##FILTER=<ID=RefMapQual,Description="Average mapping quality of reference reads < 15"> 33 ##FILTER=<ID=RefBaseQual,Description="Average base quality of reference reads < 28"> 34 ##FILTER=<ID=MMQSdiff,Description="Mismatch quality sum difference (ref - var) > 50"> 35 ##FILTER=<ID=MapQualDiff,Description="Mapping quality difference (ref - var) > 50"> 36 ##FILTER=<ID=ReadLenDiff,Description="Average supporting read length difference (ref - var) > 0.25">

do I need to add this information to sample annotaion? Or change vcf file?

vyepez88 commented 7 months ago

it seems that MinMMQSdiff is not defined in the VCF file header. You would need to modify your VCF files.

Acetyl-ZHOU commented 7 months ago

So can I just write something for MinMMQSdiff? I don`t know how to add this to my Vcf.

vyepez88 commented 7 months ago

consider validating your vcf files beforehand using for example the validation on only VCF format tests from GATK: https://gatk.broadinstitute.org/hc/en-us/articles/360037057272-ValidateVariants