Closed rjcorb closed 1 year ago
@jharenza I have updated the wrapper bash script to include arguments necessary to run select-clinVar-submissions.R
. This should be able to run under any conditions with the current conditional statements -- if variant_summary and submission_summary files not provided then they will be searched for, and if not found they will be downloaded to run the script. And default settings (no conceptID, resolve with latest date) will be assumed if not otherwise explicitly stated when running run_autogvp.sh
@jharenza I've updated the conditional statements and repo README. The -f
and ! -e
notation works for checking if files exist, but for check if a variable has a null value I used -z/-n
(variable is null/ variable is not null). Another ChatGPT rec!
Purpose/implementation Section
What feature is being added or bug is being addressed?
Closes #190. This PR updates
select-clinVar-submissions.R
to allow the following parameters:conceptID_list
; a ClinVar concept ID data file used to filter submissions of conflicting variants to those associated with any provided concept ID. The script will take any variants with 1 remaining submission as final call, and resolve others through consensus calling, latest date evaluated, or most severe call (see below).conflict_res
; eitherlatest
(default) ormost_severe
. Iflatest
, conflicting variants with submission associated with concept IDs will be resolved by taking most recent submission. Ifmost_severe
, conflicts are resolved by prioritizing most severe call (Pathogenic
>Likely pathogenic
>Uncertain significance
>Likely benign
>Benign
)What was your approach?
conceptID_list
is provided and ifconflict_res == "latest"
orconflict_res == "most_severe"
What GitHub issue does your pull request address?
190
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Please check new code logic, and test run
select-clinVar-submissions.R
using all combinations of parameters. There currently isn't an option to specify name of output file, so the--outdir
argument will need to be changed to save different output versionsNo concept ID list provided:
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir <out_dir>
Concept ID list provided, latest call strategy:
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --conceptID_list data/clinvar_cpg_concept_ids.tsv --conflict_res "latest" --outdir <out_dir>
Concept ID list provided, most severe strategy:
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --conceptID_list data/clinvar_cpg_concept_ids.tsv --conflict_res "most_severe" --outdir <out_dir>
data/clinvar_cpg_concept_ids.tsv
). These are all concept IDs associated with cancer predisposition gene variants.Is there anything that you want to discuss further?
Does the current code logic agree with what was discussed in our group meeting?
9766/101811 (9.5%) of variants with conflicting interpretation currently have discordant final calls between the three runs shown above.
Documentation Checklist