What feature is being added or bug is being addressed?
This PR creates a script that filters parsed vcf file to retain one gene annotation row per variant. This data frame is subsequently merged from AutoGVP output to create final comprehensive and abridged outputs.
What was your approach?
04-filter_gene_annotations.R performs the following:
Separates CSQ column in parsed vcf file so that subfields are column-separated (separate_wider_delim), and gene/transcript annotations are row-separated (separate_longer_delim)
Utilizes the PICK column the retain a single gene annotation row per variant (chosen based on canonical transcript status, transcript support level, transcript type, highest impact gene consequence, etc).
Merges parsed vcf with output from 01-annotate_variants_CAVATICA.R or 01-annotate_variants_custom.R
What GitHub issue does your pull request address?
81
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Please run script on test data as follows and review output:
Purpose/implementation Section
What feature is being added or bug is being addressed?
This PR creates a script that filters parsed vcf file to retain one gene annotation row per variant. This data frame is subsequently merged from AutoGVP output to create final comprehensive and abridged outputs.
What was your approach?
04-filter_gene_annotations.R
performs the following:CSQ
column in parsed vcf file so that subfields are column-separated (separate_wider_delim
), and gene/transcript annotations are row-separated (separate_longer_delim
)PICK
column the retain a single gene annotation row per variant (chosen based on canonical transcript status, transcript support level, transcript type, highest impact gene consequence, etc).01-annotate_variants_CAVATICA.R
or01-annotate_variants_custom.R
What GitHub issue does your pull request address?
81
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Please run script on test data as follows and review output:
Please review code used to parse
CSQ
column and to select unique gene annotationsIs there anything that you want to discuss further?
This script should be robust in cases where VEP
CSQ
field is present, although it needs to be tested on additional data sets.Documentation Checklist