berntpopp / variant-linker

MIT License
0 stars 0 forks source link

Feature Request: Implement scoring functionality in `variant-linker` to calculate meta scores based on the VEP annotations obtained. #1

Closed berntpopp closed 3 months ago

berntpopp commented 6 months ago

Proposed Solution

Acceptance Criteria

berntpopp commented 4 months ago

This step should be build from two config files:

  1. to extract the variables from the annotion VEP step. For example:
    [
    {
    "transcript_consequences": [
      {
        "impact": "impact_variant",
        "cadd_phred": "cadd_phred_variant",
      },
    ],
    "colocated_variants": [
      {
        "frequencies": {
          "T": {
            "gnomade": "gnomade_variant"
          }
        },
      }
    ],
    }
    ]
  2. one config file for the formula build in steps
    [
    {
    "meta_score": cadd_phred_variant * 2 + gnomade_variant * 10
    }
    ]
berntpopp commented 3 months ago

Description

Implement a new scoring functionality within the variant-linker tool to calculate meta scores based on VEP annotations. This feature will enhance the tool's capabilities by providing users with additional insights through custom scoring algorithms.

Proposed Solution

  1. Configuration Files

    • Variable Assignment Configuration: This file will define how to extract and assign variables from the VEP annotation data.
    • Formula Configuration: This file will define the formula used to calculate the meta scores.
  2. Schema.org Style Configuration

    • Utilize JSON-LD schema definitions to create a flexible and standard way of defining the configurations.
  3. Transformation and Conditional Logic

    • The configuration should allow transformations (e.g., setting values to 0 if non-existent) and conditional logic for variable assignments (e.g., setting a variable based on the presence of specific terms).
  4. Integration with Variant-Linker

    • The new scoring functionality should be integrated into the variant-linker process and be enabled or disabled via a command-line option.
    • The computed scores should be added back to each relevant element in the variant array.

Detailed Plan

Configuration Files

  1. Variable Assignment Configuration

    • Define how to extract variables from VEP annotations.
    • Example:
      [
      {
       "transcript_consequences": [
         {
           "impact": "impact_variant",
           "cadd_phred": "cadd_phred_variant"
         }
       ],
       "colocated_variants": [
         {
           "frequencies": {
             "T": {
               "gnomad": "gnomad_variant"
             }
           }
         }
       ]
      }
      ]
  2. Formula Configuration

    • Define the formula for calculating the meta scores.
    • Example:
      [
      {
       "meta_score": "cadd_phred_variant * 2 + gnomad_variant * 10"
      }
      ]

Configuration Schema

  1. Variable Assignment Configuration Schema

    • Define the JSON-LD schema for variable assignment configuration.
    • Example:
      {
      "@context": "https://schema.org/",
      "@type": "Configuration",
      "variables": {
       "transcript_consequences": {
         "impact": "impact_variant",
         "cadd_phred": "cadd_phred_variant"
       },
       "colocated_variants": {
         "frequencies": {
           "T": {
             "gnomad": "gnomad_variant"
           }
         }
       }
      }
      }
  2. Formula Configuration Schema

    • Define the JSON-LD schema for formula configuration.
    • Example:
      {
      "@context": "https://schema.org/",
      "@type": "Configuration",
      "formulas": [
       {
         "meta_score": "cadd_phred_variant * 2 + gnomad_variant * 10"
       }
      ]
      }

Integration with Variant-Linker

  1. Command-Line Option

    • Add a new command-line option to enable or disable the scoring functionality.
    • Example:
      const argv = yargs
      .option('scoring_config', {
       alias: 'sc',
       description: 'Path to the scoring configuration file',
       type: 'string'
      })
      // existing options...
  2. Scoring Function Implementation

    • Implement the scoring function that reads the configuration files, applies the formulas, and adds the computed scores to the variant data.
berntpopp commented 3 months ago

Implementation of testing will be handled in #5