berntpopp / variant-linker

MIT License
0 stars 0 forks source link

Enhancement: Improve the stability of the annotation process by using genomic coordinates instead of HGVS. This will involve several steps, including using the region endpoint, converting VCF notation, and adding VCF detection. #14

Closed berntpopp closed 2 weeks ago

berntpopp commented 2 weeks ago

Current Behavior

Currently, the annotation process relies on the HGVS endpoint, which can be less stable and may not handle all variants effectively.

Proposed Solution

  1. Use Genomic Coordinates for VEP Annotation:

    • Deprecate the current vepAnnotation function and rename it to vepHgvsAnnotation.
    • Implement a new vepRegionsAnnotation function that uses the region endpoint (https://rest.ensembl.org/vep/homo_sapiens/region/3:319781-319781:1/-) for VEP annotation.
    • Create a function to convert VCF notation (e.g., 1-65568-A-C) to the Ensembl default format (1 65568 65568 A/C 1).
    • Feed the computed Ensembl default format into the new vepRegionsAnnotation function.
  2. Add VCF Detection Option:

    • Implement a function to detect the input format.
    • If VCF format is detected (e.g., 1-65568-A-C), skip the Variant Recoder step.
    • Directly transform the VCF notation into the Ensembl default format and use it in the vepRegionsAnnotation function.

Acceptance Criteria