gks-anvil / vrs_anvil_toolkit

Extract clinical variant interpretations from VCF using GA4GH VRS IDs
MIT License
2 stars 1 forks source link

feature/improve-readme #23

Closed bwalsh closed 8 months ago

bwalsh commented 8 months ago

README.md is out of date

bwalsh commented 8 months ago

Suggested changes

---//--

image

VRS AnVIL

Project Overview

This Python project is designed to process Variant Call Format (VCF) files or other sources of variant information and perform lookup operations on Genomic Variation Representation Service (GA4GH VRS) identifiers. The GA4GH VRS identifiers provide a standardized way to represent genomic variations, making it easier to exchange and share genomic information.

In addition, this project facilitates the retrieval of evidence associated with genomic alleles by leveraging the Genomic Data Representation and Knowledge Base (GA4GH MetaKB) service. GA4GH MetaKB provides a comprehensive knowledge base that links genomic variants to relevant evidence, enabling users to access valuable information about genomic alleles.

Features

  1. VCF File Processing:

    • The project includes modules for reading and parsing VCF files, extracting relevant genomic information.
  2. GA4GH VRS Identifier Lookup:

    • Utilizes the GA4GH VRS API to perform lookups for each genomic variation mentioned in the VCF file.
    • Retrieves standardized identifiers for the variations, enhancing interoperability with GA4GH-compliant systems.
    • GA4GH MetaKB Service Integration: Utilizes the GA4GH MetaKB service to query and retrieve evidence associated with the specified genomic alleles.
  3. Output Generation:

    • Generates summary metrics about throughput, errors and evidence hits and misses
    • Optionally, generates a processed VCF file with additional GA4GH VRS identifiers for each genomic variation.
    • Presents the retrieved evidence in a structured format, including information about studies, publications, and other relevant details.
  4. Error Handling:

    • Implements robust error handling to address issues like invalid input files, connectivity problems with the GA4GH VRS API, invalid variants and more.

Getting Started

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/ohsu-comp-bio/vrs-anvil
    cd vrs-anvil
  2. Install dependencies:

🚧 Quinn Wai Please update

   pip install -r requirements.txt

Usage

🚧 Quinn Wai Please update

  1. Run the VCF processor:

    vrs_anvil input.vcf

    Replace input.vcf with the name of your VCF file.

  2. The processed VCF file with GA4GH VRS identifiers will be generated as output_processed.vcf in the same directory.

Contributing

This project is open to contributions from the research community. If you are interested in contributing to the project, please contact the project team. See the contributing guide for more information on how to contribute to the project.

License

🚧 Quinn Wai Please update, create license file

This project is licensed under the MIT License - see the LICENSE file for details.