Open GACGAMA opened 1 month ago
Hi @GACGAMA, Hope you are well? Please can you share your filter_vep command ? According to documentation for --flag_pick_allele, the PICK flag is added to the chosen block of consequence data. Let us know if you expect something different. Thank you Ola.
Hi @olaaustine
I'm currently using:
singularity exec -H /scratch4/nsobrei2/references/vep_cache_singularity /scratch4/nsobrei2/singularities/vep.sif filter_vep --force_overwrite --input_file {1} --output_file /scratch4/nsobrei2/ggama1/OMIM_GENES/vep/filtering/PICK1_STEP2.vcf --only_matched --filter "PICK = 1
"
Before that I was trying with R by expanding the CSQ column. For both methods I get 118028 variants without filtering
To count variants I used both BCFtools and R. In R, I filtered by expanding the CSQ column and removing duplicates based on chrom, pos, alt, ref on a normalized VCF
For bcftools
bcftools query -f '%POS\n' myvcf | wc -l
But when I use the PICK column to filter, in R, I get only 110713 variants If I use the filter_vep, I get only 110722 variants.
I expected that ensembl will always PICK at least one transcript with --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length
Hi @GACGAMA, Hope you are well? If possible can you share your input file or a subset of the variants in your input so I can try to recreate. Thank you very much Ola.
Hi @olaaustine
I can send a minimal working example vcf. Is there any email I can send this data? I`ve been able to reproduce this issue on multiple files
Hi @GACGAMA, If possible can you share the minimal working example VCF here? If thats not possible, please do not hesitate to send them using this link https://www.ensembl.org/Help/Contact with the title "flag_pick_allele VCF file" Hoping to hear from you soon. Thank you very much Ola.
System
VEP 111 Docker/Singularity
Full VEP command line
Problem
As you can see from my command, I'm annotating my file with many different flags. With this command, I get a multisample VCF file with 118028 variants. But, if I filter this file with PICK = 1, I get only 110713 variants.
Is this the expected behaviour of flagging variants? I tought all variants would always have at least one trascript flagged