Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
456 stars 152 forks source link

Fix inconsistencies between --hgvs and --hgvsg #1750

Closed nuno-agostinho closed 2 months ago

nuno-agostinho commented 2 months ago

ENSVAR-3174

Changelog

  1. Currently, asking for HGVSg using offline and without FASTA, returns HGVSg using Ns as reference sequence. The correct behaviour would be to mimic what is done with HGVS and error out.
  2. When accessing the database to retrieve sequence for HGVSg, a warning should be printed informing the user of such (like done for HGVS).
  3. When retrieving sequence from database using --hgvs, the message is printed three times:
2024-09-03 13:25:00 - INFO: Database will be accessed when using --hgvs
2024-09-03 13:25:00 - INFO: Database will be accessed when using --hgvsc
2024-09-03 13:25:00 - INFO: Database will be accessed when using --hgvsp

As --hgvsc and --hgvsp are internal parameters (not VEP arguments), they should be omitted from the message.

Testing

  1. vep --hgvsg --offline --cache $vep_cache --id "1 230710048 rs699 A G" --force should raise an error:

    MSG: ERROR: Cannot generate HGVS coordinates (--hgvs and --hgvsg) in offline mode without a FASTA file (see --fasta)
  2. vep --hgvsg --cache $vep_cache --id "1 230710048 rs699 A G" --force should warn that the database will be accessed to retrieve the sequence:

    2024-09-03 13:21:27 - INFO: Database will be accessed when using --hgvsg
  3. vep --hgvs --cache $vep_cache --id "1 230710048 rs699 A G" --force should only warn that the database is accessed to retrieve the sequence for --hgvs.