Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 150 forks source link

Keep values from input INFO column in output #1669

Closed schreyers closed 1 month ago

schreyers commented 1 month ago

Heya, this might already be a function but I can't find it sorry.

In my input line, the INFO column has data that I want to keep (DP and Clinvar for example) but when I annotate, it gets removed from the output?

Is there a flag to keep the INFO column in the output? Or to keep the DP / Clinvar columns?

Input would be:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT chr17 7577121 rs121913343 G A 100 PASS DP=14519;clinvar=1|pathogenic,1|pathogenic,1|uncertain_significance,1|conflicting_interpretations_of_pathogenicity,1|pathogenic,1|not_provided,1|likely_pathogenic,1|likely_pathogenic,1|likely_pathogenic;cosmic=1|COSM10659,1|COSM1645518,1|COSM3355991,1|COSM99933;phyloP=1.856;CSQT=1|TP53|NM_000546.5|missense_variant GT:GQ:AD:DP:VF:NL:SB:NC 0/1:100:8273,6243:14519:0.430:19:-100.0000:0.0070

My VEP run line is:

./vep --cache --format vcf --no_stats --sift b --polyphen b --symbol --numbers --domains --regulatory --canonical --protein --biotype --max_af --pubmed --uniprot --mane --tsl --appris --variant_class --gene_phenotype --mirna --check_existing --allele_number --show_ref_allele --uploaded_allele --use_given_ref --hgvsp_use_prediction --hgvs --fasta /opt/ensembl-vep/FASTA/Homo_sapiens.GRCh37.dna.primary_assembly.fa --force_overwrite --species homo_sapiens --merged --assembly GRCh37 --tab -i input.vcf -o output.txt --plugin LOVD

Thanks for any help :)

schreyers commented 1 month ago

I see if I use the --vcf flag instead of the --tab then all the data is merged, but then the --filter doesn't work (I think)

So is there a way to convert the --vcf out put to be like the --tab version with all the data?

nakib103 commented 1 month ago

Hi @schreyers,

Thanks for your query!

As you say if you want the your input VCF fields to be present in VEP output you can just output a VCF file (using --vcf option).

The -filter should also work on VCF files. See the documentation for more details - https://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html Let me know if the filter VEP does not work.

Best regards, Nakib

schreyers commented 1 month ago

Thanks @nakib103 !

I was looking to make it work for the --tab option though? Is there a way to do that?

Thanks

nakib103 commented 1 month ago

The --tab would not keep the VCF fields. There is no argument to keep them. Though, it is a roundabout way, you can probably use the input file again with --custom option to add back those INFO fields - https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html

schreyers commented 1 month ago

Ok, I can give that a go :)

So, just to make sure I get the process correct:

  1. Run my original command but with --vcf option
  2. Use the output of step one (as the custom file input) and run a command with the --custom options to output the data as a tab file
  3. Filter if needs

The thing that confuses me is the Filename field required for the --custom command as I would have already done the annotation needed for data

Appreciate the help so far :)

nakib103 commented 1 month ago

not quite 😅 ,

  1. Create a separate copy of your input file. Process the copy to be suitable for using in custom annotation (basically perform bgzip compression and tabix indexing - see the doc I shared for details)
  2. Run your command as it is (with --tab option) but add --custom option. It would look something like this - --custom file=copy_of_input.vcf.gz,format=vcf,short_name=test,fields=DP%clinvar

Hope that helps.

schreyers commented 1 month ago

Thanks so much @nakib103 !

It makes sense now :)