diskin-lab-chop / AutoGVP

19 stars 3 forks source link

rm duplicate cols in intervar/multianno dfs #147

Closed rjcorb closed 1 year ago

rjcorb commented 1 year ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

Closes #146. This PR removes duplicated columns that are introduced when merging intervar and multianno files.

What was your approach?

modified 01_annotate_variants_*_input.R to remove redundant columns from intervar file before merging with multianno df.

What GitHub issue does your pull request address?

146

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Please test on pbta test files, and confirm output from 01-annotate_variants_CAVATICA_input.R does not contain duplicated columns:

bash run_autogvp.sh --workflow="cavatica" \
--vcf=input/test_pbta.single.vqsr.filtered.vep_105.vcf \
--filter_criteria='INFO/AF>=0.2 INFO/DP>=15 (gnomad_3_1_1_AF_non_cancer<0.01|gnomad_3_1_1_AF_non_cancer=".")' \
--intervar=input/test_pbta.hg38_multianno.txt.intervar \
--multianno=input/test_pbta.hg38_multianno.txt \
--autopvs1=input/test_pbta.autopvs1.tsv \
--outdir=../results \
--out="test_pbta"

NOTE: to confirm that test_pbta.custom_input.annotations_report.abridged.tsv does not contain duplicated columns, please comment out the last line from run_autogvp.sh:

rm $autogvp_input $vcf_parsed_file $out_dir/$autogvp_output $out_dir/$out_file.filtered_csq_subfields.tsv $out_dir/${out_file}_multianno_filtered.txt $out_dir/${out_file}_autopvs1_filtered.tsv $out_dir/${out_file}_intervar_filtered.txt

Is there anything that you want to discuss further?

No

Documentation Checklist