diskin-lab-chop / AutoGVP

19 stars 3 forks source link

Add gene annotation filtering, final output script #99

Closed rjcorb closed 1 year ago

rjcorb commented 1 year ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

This PR creates a script that filters parsed vcf file to retain one gene annotation row per variant. This data frame is subsequently merged from AutoGVP output to create final comprehensive and abridged outputs.

What was your approach?

04-filter_gene_annotations.R performs the following:

What GitHub issue does your pull request address?

81

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Please run script on test data as follows and review output:

Rscript 04-filter_gene_annotations.R --vcf input/test_pbta_filtered_parsed_vcf.tsv \
--autogvp input/test_pbta.cavatica_input.annotations_report.abridged.tsv \
--output "test_cavatica_pbta"

Please review code used to parse CSQ column and to select unique gene annotations

Is there anything that you want to discuss further?

This script should be robust in cases where VEP CSQ field is present, although it needs to be tested on additional data sets.

Documentation Checklist