feature/presentation-outline

Problem Statement: Attaching VCF processing to clinical evidence optimally can be challenging
What tools exist out there?
- bcftools / pysam:
  - Does vcf processing / filtering / etc
  - [sensitive to incorrectly formatted VCF headers / limited scope]
- vrs-python:
  - able to translate into fully justified allele using coordinate info!
  - Can even annotate your VCF w/ VRS IDs and write VRS objects
  - [Not connected to clinical evidence]
- metakb
  - clinical evidence in a meta knowledgebase
  - Contains CIVIC and MOA data
  - has VRS IDs and specific study API
  - [no connection to VCFs]
Sol: vrs_anvil annotate
- Outline tool here (Walsh’s flow diagram) — this could be used as a reference for all other pieces
- What it does
  - organizes settings in manifest to pulls VCFs
  - Gets allele VRS ID per variant using vrs-python w/ threading, multiprocessing, and caching
  - identifies hits to local metakb cache (hits is low, so no need to query api every time)
  - writes to logs and metrics file
  - packaged in a CLI!
How we might use it: (1000G proof of concept)
- Gather VCFs required through Terra
- Create manifest.yaml
- Run nohup vrs_anvil annotate —scatter &
  - Analysis: 1000-figures.ipynbGet the study IDs associated with each metakb cache hit
  - Get % samples with variant match
  - Visualize number of samples per patient
  - Give a few example descriptions of study hits
Discussion: Pros and Cons and when to use
- vrs-python VCFAnnotator: direct from source, need translation only, need annotated VCFs
- metakb api: already have set of identifiers and just want study results
- vrs_anvil : need threading / multiprocessing baked in, wanna get to metakb, error handling, organized file runs [LOOK AT ISSUES resolved to see features]
Further work:
- Cohort Allele Frequency data
- ???
Wanna experiment? Use this? Contribute?
- repo
- vrs-anvil workspace Thanks and acknowledgements

gks-anvil / vrs_anvil_toolkit

feature/presentation-outline #74