diskin-lab-chop / AutoGVP

19 stars 3 forks source link

wrapper script #97

Closed naqvia closed 1 year ago

naqvia commented 1 year ago

Purpose/implementation Section

Create a wrapper script that will run all the scripts for the AutoGVP tool, including filter step, AutoGVP calling, and final table generation.

What feature is being added or bug is being addressed?

https://github.com/diskin-lab-chop/AutoGVP/issues/82

What was your approach?

A shell script that takes the necc input parameters to run the entire tool. The script calls each script separately in sequential order and prints out desired final tables.

What GitHub issue does your pull request address?

https://github.com/diskin-lab-chop/AutoGVP/issues/82

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

First run select-clinVar-submissions.R within AutoGVP/ directory (required for AutoGVP wrapper script):

Rscript select-clinVar-submissions.R --variant_summary input/variant_summary.txt.gz --submission_summary input/submission_summary.txt.gz

Then run AutoGVP wrapper script on test files using following parameters:

For CAVATICA:

bash run_autogvp.sh --workflow="cavatica" \
--vcf=input/test_pbta.single.vqsr.filtered.vep_105.vcf \
--filter_criteria='gnomad_3_1_1_AF_non_cancer<0.1|gnomad_3_1_1_AF_non_cancer="."' \
--intervar=input/test_pbta.hg38_multianno.txt.intervar \
--multianno=input/test_pbta.hg38_multianno.txt \
--autopvs1=input/test_pbta.autopvs1.tsv \
--outdir=../results \
--out="test_pbta"

For custom workflow:

bash run_autogvp.sh --workflow="custom" \
--vcf=input/test_VEP.vcf \
--clinvar=input/clinvar.vcf.gz \
--intervar=input/test_VEP.hg38_multianno.txt.intervar \
--multianno=input/test_VEP.vcf.hg38_multianno.txt \
--autopvs1=input/test_autopvs1.txt \
--outdir=../results \
--out="test_custom"

Which areas should receive a particularly close look?

Review run_autogvp.sh and assess if script runs successfully. Inspect output files test_pbta-autogvp-annotated-abridged.tsv and test_pbta-autogvp-annotated-full.tsv.

Is there anything that you want to discuss further?

We will need to test this on additional samples, especially those produced from custom workflows. The current custom test files do not run successfully, since there are some issues with the vcf header that cause filter and parse vcf scripts to fail.

Documentation Checklist

naqvia commented 1 year ago

Works as expected. The pre-processing step speeds up the tool significantly. I added a set -e statement to the script so it exits and stops running if any of the commands fail instead of continue to go through the script.

naqvia commented 1 year ago

Looks good to me. We would just need to update the readme on the main repo and we'd be in good shape! cc @jharenza

rjcorb commented 1 year ago

@naqvia @jharenza @rebkau I have made modifications to allow the wrapper script to work on custom workflows. @naqvia, can you please re-test this using the custom test files to ensure it runs and check output files? I've included the usage in the PR text. @rebkau -- I know it will take some time to process the PBTA files with the custom workflow, but the wrapper script is ready to test when the input files are generated.