Closed schreyers closed 1 month ago
I see if I use the --vcf flag instead of the --tab then all the data is merged, but then the --filter doesn't work (I think)
So is there a way to convert the --vcf out put to be like the --tab version with all the data?
Hi @schreyers,
Thanks for your query!
As you say if you want the your input VCF fields to be present in VEP output you can just output a VCF file (using --vcf
option).
The -filter
should also work on VCF files. See the documentation for more details -
https://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html
Let me know if the filter VEP does not work.
Best regards, Nakib
Thanks @nakib103 !
I was looking to make it work for the --tab option though? Is there a way to do that?
Thanks
The --tab
would not keep the VCF fields. There is no argument to keep them. Though, it is a roundabout way, you can probably use the input file again with --custom
option to add back those INFO fields -
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
Ok, I can give that a go :)
So, just to make sure I get the process correct:
The thing that confuses me is the Filename field required for the --custom command as I would have already done the annotation needed for data
Appreciate the help so far :)
not quite 😅 ,
bgzip
compression and tabix
indexing - see the doc I shared for details) --tab
option) but add --custom
option. It would look something like this - --custom file=copy_of_input.vcf.gz,format=vcf,short_name=test,fields=DP%clinvar
Hope that helps.
Thanks so much @nakib103 !
It makes sense now :)
Heya, this might already be a function but I can't find it sorry.
In my input line, the INFO column has data that I want to keep (DP and Clinvar for example) but when I annotate, it gets removed from the output?
Is there a flag to keep the INFO column in the output? Or to keep the DP / Clinvar columns?
Input would be:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT chr17 7577121 rs121913343 G A 100 PASS DP=14519;clinvar=1|pathogenic,1|pathogenic,1|uncertain_significance,1|conflicting_interpretations_of_pathogenicity,1|pathogenic,1|not_provided,1|likely_pathogenic,1|likely_pathogenic,1|likely_pathogenic;cosmic=1|COSM10659,1|COSM1645518,1|COSM3355991,1|COSM99933;phyloP=1.856;CSQT=1|TP53|NM_000546.5|missense_variant GT:GQ:AD:DP:VF:NL:SB:NC 0/1:100:8273,6243:14519:0.430:19:-100.0000:0.0070
My VEP run line is:
./vep --cache --format vcf --no_stats --sift b --polyphen b --symbol --numbers --domains --regulatory --canonical --protein --biotype --max_af --pubmed --uniprot --mane --tsl --appris --variant_class --gene_phenotype --mirna --check_existing --allele_number --show_ref_allele --uploaded_allele --use_given_ref --hgvsp_use_prediction --hgvs --fasta /opt/ensembl-vep/FASTA/Homo_sapiens.GRCh37.dna.primary_assembly.fa --force_overwrite --species homo_sapiens --merged --assembly GRCh37 --tab -i input.vcf -o output.txt --plugin LOVD
Thanks for any help :)