Closed rsalz closed 2 years ago
Hi @rsalz, I cannot reproduce your error. I'm able to use a VEP command with annotations from both gff file and cache. Could you please:
The gff file came from TransDecoder. I attached the bgzipped gff file that I use. The command I try to use is the following:
vep --cache --merged --assembly GRCh38 --cache_version 101 --max_af --sift b --polyphen b --protein --domains --gff transdecoder.gff3.gz --pick_allele --pick_order rank,biotype,length --fasta hg38.fa -i all_patients.vcf -o annotated_severe.tab
Just removing --gff transdecoder.gff3.gz
is enough to get the command to work.
I still cannot reproduce the issue. How big is your input file and which types of variants does it have? If you are allow to, could you please email the file to helpdesk@ensembl.org?
I can't share the variants. So I tried the annotation again with a publicly available variant set (NA12878) and had the same error
Are you using an indexed cache? http://ftp.ensembl.org/pub/release-104/variation/indexed_vep_cache/
Yes, I installed the cache using the prompt with the vep_install
conda wrapper
I tested the command with the input file NA12878 and your gff file, still cannot reproduce the error. You could try to remove the cache downloaded with conda and download it manually from here. Documentation about cache can be found here. If that doesn't work, you could download the fasta file following these instructions. The files are available here.
Neither of these actions fixed my error
How much time does it take for the job to fail?
2-3 minutes. There is an output, but never makes it past the variants in chromosome 1. I get about 50K variants annotated before vep suddenly stops
Do you know how much memory the job uses? Using the file NA12878, which variant is the last one to be annotated before the job fails?
Did you follow these instructions to prepare the fasta file:
curl -O http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gzip -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
bgzip Homo_sapiens.GRCh38.dna.primary_assembly.fa
Could you please run the job again with flag --offline
to make sure VEP doesn't connect to the database.
I tried bgzip, no change. Then I ran it again with the --offline
flag and now the error is gone. Thanks for your help!
Describe the issue
Unable to combine --cache and --gff without getting a 'Segmentation fault (core dumped)'. Reducing the buffer size did not help.
Additional information
I am trying to annotate variants using the information in the cache for GRCh38 but would also like to annotate with my own transcript set. The goal is to get the most 'severe' effect per variant while taking both the reference and the custom transcript annotations into account. I also want to add a few extra things to the output such as allele frequency and polyphen predictions for GRCh38 (even if the most severe effect occurs in the custom annotation!), hence the use of the cache.
As an alternative, I used a gff combiner to combine the reference and custom annotation to do this VEP run without the cache, which worked fine but then I couldn't also annotate the variants with max AF. I even tried adding more '--custom' annotations from VEP's ftp to add the max af but was getting variant identifiers instead of the actually allele frequency... (‘--custom http://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ExAC.0.3.GRCh38.vcf.gz,exacAF,vcf,exact,,’)
I know that running GRCh38 and custom gff individually works, but I don't want to run VEP twice and I want the most severe effect while taking both into account. I'm open to any solution that gets me the output I want with 1 VEP run.
System
downloaded VEP using anaconda
Full VEP command line
Full error message
Data files (if applicable)
gff3 file organized and verified using AGAT utility. I know that the gff3 is fine because VEP works properly when annotating using these files without adding the cache. Also VEP works properly when removing the gff file and leaving only the cache.