Open davmlaw opened 4 years ago
Hi,
Thanks for this report. You're correct, there's an underlying inconsistency and documentation issue here that we can improve.
Currently, our HGVSg and HGVSc descriptions take their underlying reference sequence directly from our internal sequence lookups, rather than from what's given by the user. This is designed to limit any problems with downstream analyses by ensuring our HGVS is accurate. However, this is not overwritten by --use_given_ref
, when it arguably should be. As suggested, our documentation should also provide more detail about this behaviour.
Additionally, HGVSp does not follow this behaviour - it will use the given reference when calculating the appropriate peptide changes. This different behaviour between HGVSp and HGVSc is something that we will look to address.
We'll discuss this issue within the team over the next few days to decide exactly now to tackle this issue, and we'll let you know how we intend to proceed. Thank you for bringing this to our attention.
Kind Regards, Andrew
Describe the issue
When you run VEP against a VCF record with a reference that is different from the fasta sequence, the consequences are calculated against provided base, while HGVS is calcualted from the fasta reference sequence.
Example (GRCh37) - provided ref different from fasta reference:
Expected result: HGVS would match provided reference and other annotations Actual result: HGVS is from fasta reference (17:7574012 C>A) and is inconsistent with consequences and input sequence
It may be that HGVS must use the reference sequence (as eg changing a base may chance a splice site and thus an exon, or alter alignment/normalization) - in which case, it may be a good idea to add a note it to the documentation, I suggest here:
https://asia.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_hgvs
System
Full VEP command line