Closed ysm0128 closed 8 years ago
Apologies for the delay. For snpeff, could you try using version 4.1e instead of 4.1l? We have not yet tested STMP on 4.1l and it looks like the parameters have changed slightly.
Additionally, which version of bedtools are you using? The segmentation fault could be due to an older version of bedtools being used (we are currently using version 2.25.0). Also, could you try re-running with the latest code? I just committed a new version that addresses several issues with intersectBed.
Were you able to get this working? Could you try again with the latest version and reopen this issue if it persists?
Hi, I'm interested in STMP and want to run STMP. Whenever I try to run STMP on an input VCF, an error occurs. I have been trying to figure our what is causing the error I am getting, but I can't get my issue.
Here is my code.
[ysm0128@piano stmp_release]$ python stable/code/stmp.py --vcf=sample_input_data/genome_in_a_bottle/subset.rs.vcf --output_dir=sample_outputs/genome_in_a_bottle_output using database file: /storage/home/ysm0128/download/stmp_release/stable/db/annotationDB.sqlite Converting VCF so there is only 1 allele per line bcftools split multiallelic command: bcftools norm -m - '/storage/home/ysm0128/download/stmp_release/sample_input_data/genome_in_a_bottle/subset.rs.vcf' -O z -o '/storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/subset_rs_noMultiallelic.vcf.gz' Lines total/modified/skipped: 878/0/0 Stripping chr prefix from VCF CHROM column (if present) Running snpEff on subset_rs_noMultiallelic_vc_strippedChr_vc snpeff annot cmd: java -Xmx6g -jar /storage/home/ysm0128/download/stmp_release/third_party/snpeff/snpEff/snpEff.jar eff hg19 /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/subset_rs_noMultiallelic_vc_strippedChr.vcf.gz -stats /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch -csvStats sample_subset_rs_noMultiallelic_vc_strippedChr_vc found in database. Beginning annotation without re-upload. Running annovar on subset_rs_noMultiallelic_vc_strippedChr_vc annovar annotation cmd: /storage/home/ysm0128/download/stmp_release/third_party/annovar/table_annovar.pl /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/subset_rs_noMultiallelic_vc_strippedChr.vcf.gz /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb -buildver hg19 -out /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar -remove -protocol refGene,knownGene,wgEncodeGencodeBasicV19 -operation g,g,g -vcfinput Annotating variants for sample sample_subset_rs_noMultiallelic_vc_strippedChr_vc Performing BEDtools intersections with 3 tables range annotation cmd: (cat /storage/home/ysm0128/download/stmp_release/stable/code/../db/db_beds/refseq_r.bed.header; intersectBed -a /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/subset_rs_noMultiallelic_vc_strippedChr.vcf.gz -b /storage/home/ysm0128/download/stmp_release/stable/code/../db/db_beds/refseq_r.bed -loj | /storage/home/ysm0128/download/stmp_release/stable/code/condense_intersectBed_output.py --modules=/storage/home/ysm0128/download/stmp_release/stable/code/config/modules.yml | cut -f13) range annotation cmd: (cat /storage/home/ysm0128/download/stmp_release/stable/code/../db/db_beds/hg19_phastConsElements46way_r.bed.header; intersectBed -a /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/subset_rs_noMultiallelic_vc_strippedChr.vcf.gz -b /storage/home/ysm0128/download/stmp_release/stable/code/../db/db_beds/hg19_phastConsElements46way_r.bed -loj | /storage/home/ysm0128/download/stmp_release/stable/code/condense_intersectBed_output.py --modules=/storage/home/ysm0128/download/stmp_release/stable/code/config/modules.yml | cut -f13) range annotation cmd: (cat /storage/home/ysm0128/download/stmp_release/stable/code/../db/db_beds/exac_tolerance_r.bed.header; intersectBed -a /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/subset_rs_noMultiallelic_vc_strippedChr.vcf.gz -b /storage/home/ysm0128/download/stmp_release/stable/code/../db/db_beds/exac_tolerance_r.bed -loj | /storage/home/ysm0128/download/stmp_release/stable/code/condense_intersectBed_output.py --modules=/storage/home/ysm0128/download/stmp_release/stable/code/config/modules.yml | cut -f13)
NOTICE: Running with system command <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/subset_rs_noMultiallelic_vc_strippedChr.vcf.gz > /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottleoutput/scratch/annovar.avinput> /storage/home/ysm0128/bin/intersectBed: line 2: 157146 Segmentation fault (core dumped) ${0%/}/bedtools intersect "$@" /storage/home/ysm0128/bin/intersectBed: line 2: 157152 Segmentation fault (core dumped) ${0%/_}/bedtools intersect "$@" Error : Missing parameter: CSV stats file name Command line : SnpEff hg19 /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/subset_rs_noMultiallelic_vc_strippedChr.vcf.gz -stats /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch -csvStats
snpEff version SnpEff 4.1l (build 2015-10-03), by Pablo Cingolani Usage: snpEff [eff] [options] genome_version [input_file]
Options: -chr : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
-classic : Use old style annotations instead of Sequence Ontology and Hgvs.
-csvStats : Create CSV summary file.
-download : Download reference genome if not available. Default: true
-i : Input format [ vcf, bed ]. Default: VCF.
-fileList : Input actually contains a list of files to process.
-o : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
-s , -stats, -htmlStats : Create HTML summary file. Default is 'snpEff_summary.html'
-noStats : Do not create stats (summary) file
Results filter options: -fi , -filterInterval : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
-no-downstream : Do not show DOWNSTREAM changes
-no-intergenic : Do not show INTERGENIC changes
-no-intron : Do not show INTRON changes
-no-upstream : Do not show UPSTREAM changes
-no-utr : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
-no EffectType : Do not show 'EffectType'. This option can be used several times.
Annotations options: -cancer : Perform 'cancer' comparisons (Somatic vs Germline). Default: false -cancerSamples : Two column TXT file defining 'oringinal \t derived' samples.
-formatEff : Use 'EFF' field compatible with older versions (instead of 'ANN').
-geneId : Use gene ID instead of gene name (VCF output). Default: false
-hgvs : Use HGVS annotations for amino acid sub-field. Default: true
-lof : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
-noHgvs : Do not add HGVS annotations.
-noLof : Do not add LOF and NMD annotations.
-noShiftHgvs : Do not shift variants according to HGVS notation (most 3prime end).
-oicr : Add OICR tag in VCF file. Default: false
-sequenceOntology : Use Sequence Ontology terms. Default: true
Generic options: -c , -config : Specify config file -configOption name=value : Override a config file option -d , -debug : Debug mode (very verbose). -dataDir : Override data_dir parameter from config file.
-download : Download a SnpEff database, if not available locally. Default: true
-nodownload : Do not download a SnpEff database, if not available locally.
-noShiftHgvs : Do not shift variants towards most 3-prime position (as required by HGVS).
-h , -help : Show this help and exit
-noLog : Do not report usage statistics to server
-t : Use multiple threads (implies '-noStats'). Default 'off'
-q , -quiet : Quiet mode (do not show any messages or errors)
-v , -verbose : Verbose mode
-version : Show version number and exit
Database options: -canon : Only use canonical transcripts. -interval : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
-motif : Annotate using motifs (requires Motif database).
-nextProt : Annotate using NextProt (requires NextProt database).
-noGenome : Do not load any genomic database (e.g. annotate using custom files).
-noMotif : Disable motif annotations.
-noNextProt : Disable NextProt annotations.
-onlyReg : Only use regulation tracks.
-onlyProtein : Only use protein coding transcripts. Default: false
-onlyTr : Only use the transcripts in this file. Format: One transcript ID per line.
-reg : Regulation track to use (this option can be used add several times).
-ss , -spliceSiteSize : Set size for splice sites (donor and acceptor) in bases. Default: 2
-spliceRegionExonSize : Set size for splice site region within exons. Default: 3 bases
-spliceRegionIntronMin : Set minimum number of bases for splice site region within intron. Default: 3 bases
-spliceRegionIntronMax : Set maximum number of bases for splice site region within intron. Default: 8 bases
-strict : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
-ud , -upDownStreamLen : Set upstream downstream interval length (in bases)
done with intersectbed merge
Finished bedtools region (range) annotation
NOTICE: Finished reading 1012 lines from VCF file
NOTICE: A total of 884 locus in VCF file passed QC threshold, representing 758 SNPs (557 transitions and 201 transversions) and 126 indels/substitutions
NOTICE: Finished writing allele frequencies based on 758 SNP genotypes (557 transitions and 201 transversions) and 126 indels/substitutions for 1 samples
Annotating variants for sample sample_subset_rs_noMultiallelic_vc_strippedChr_vc
NOTICE: Running with system command </storage/home/ysm0128/download/stmp_release/third_party/annovar/table_annovar.pl /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.avinput /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb -buildver hg19 -outfile /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar -remove -protocol refGene,knownGene,wgEncodeGencodeBasicV19 -operation g,g,g -otherinfo -nastring .> Performing SQL join with 5 tables: clinvar hg19_ljb26_all uk10k_freq gonl hg19_popfreq_all_20150413 computing total number of lines to annotate Traceback (most recent call last): File "stable/code/stmp.py", line 384, in
joined_outfile = annotate(args)
File "stable/code/stmp.py", line 188, in annotate
point_outfile = stmp_annotation_util.annotate_point(db_conn, args.vcf, args.scratch_dir, sample_db_path, debug=args.debug_point_annotations) # Find annotations which are associated with a single locus. This is done with a SQL join.
File "/storage/home/ysm0128/download/stmp_release/stable/code/stmp_annotation_util.py", line 1575, in annotate_point
totalLines = (subprocess.check_output(lines_cmd, shell=True)).rstrip()
AttributeError: 'module' object has no attribute 'check_output'
[ysm0128@piano stmp_release]$ -----------------------------------------------------------------
NOTICE: Processing operation=g protocol=refGene
NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype refGene -outfile /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.refGene -exonsort /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.avinput /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb> NOTICE: Reading gene annotation from /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb/hg19_refGene.txt ... Done with 50914 transcripts (including 11516 without coding sequence annotation) for 26271 unique genes NOTICE: Reading FASTA sequences from /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb/hg19_refGeneMrna.fa ... Done with 23 sequences WARNING: A total of 345 sequences will be ignored due to lack of correct ORF annotation NOTICE: Finished gene-based annotation on 884 genetic variants in /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.avinput
NOTICE: Output files were written to /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.refGene.variant_function, /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.refGene.exonic_variant_function
NOTICE: Processing operation=g protocol=knownGene
NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype knownGene -outfile /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.knownGene -exonsort /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.avinput /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb> NOTICE: Reading gene annotation from /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb/hg19_knownGene.txt ... Done with 78963 transcripts (including 18502 without coding sequence annotation) for 28495 unique genes NOTICE: Reading FASTA sequences from /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb/hg19_knownGeneMrna.fa ... Done with 38 sequences WARNING: A total of 43 sequences will be ignored due to lack of correct ORF annotation NOTICE: Finished gene-based annotation on 884 genetic variants in /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.avinput
NOTICE: Output files were written to /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.knownGene.variant_function, /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.knownGene.exonic_variant_function
NOTICE: Processing operation=g protocol=wgEncodeGencodeBasicV19
NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype wgEncodeGencodeBasicV19 -outfile /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.wgEncodeGencodeBasicV19 -exonsort /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.avinput /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb> NOTICE: Reading gene annotation from /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb/hg19_wgEncodeGencodeBasicV19.txt ... Done with 95929 transcripts (including 38291 without coding sequence annotation) for 42594 unique genes Error: FASTA sequence file /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb/hg19_wgEncodeGencodeBasicV19Mrna.fa does not exist. Please use 'annotate_variation.pl --downdb wgEncodeGencodeBasicV19 /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb' download the database. Error running system command: <annotate_variation.pl -geneanno -buildver hg19 -dbtype wgEncodeGencodeBasicV19 -outfile /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.wgEncodeGencodeBasicV19 -exonsort /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.avinput /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb> Error running system command: </storage/home/ysm0128/download/stmp_release/third_party/annovar/table_annovar.pl /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar.avinput /storage/home/ysm0128/download/stmp_release/third_party/annovar/humandb -buildver hg19 -outfile /storage/home/ysm0128/download/stmp_release/sample_outputs/genome_in_a_bottle_output/scratch/annovar -remove -protocol refGene,knownGene,wgEncodeGencodeBasicV19 -operation g,g,g -otherinfo -nastring .>
I have confirmed that the folder and file do exist. What is causing the IOError and how to I resolve it? I want to run this tool as soon as possible if you are working to help.