Closed rjcorb closed 1 year ago
Closes #146. This PR removes duplicated columns that are introduced when merging intervar and multianno files.
modified 01_annotate_variants_*_input.R to remove redundant columns from intervar file before merging with multianno df.
01_annotate_variants_*_input.R
Please test on pbta test files, and confirm output from 01-annotate_variants_CAVATICA_input.R does not contain duplicated columns:
01-annotate_variants_CAVATICA_input.R
bash run_autogvp.sh --workflow="cavatica" \ --vcf=input/test_pbta.single.vqsr.filtered.vep_105.vcf \ --filter_criteria='INFO/AF>=0.2 INFO/DP>=15 (gnomad_3_1_1_AF_non_cancer<0.01|gnomad_3_1_1_AF_non_cancer=".")' \ --intervar=input/test_pbta.hg38_multianno.txt.intervar \ --multianno=input/test_pbta.hg38_multianno.txt \ --autopvs1=input/test_pbta.autopvs1.tsv \ --outdir=../results \ --out="test_pbta"
NOTE: to confirm that test_pbta.custom_input.annotations_report.abridged.tsv does not contain duplicated columns, please comment out the last line from run_autogvp.sh:
run_autogvp.sh
rm $autogvp_input $vcf_parsed_file $out_dir/$autogvp_output $out_dir/$out_file.filtered_csq_subfields.tsv $out_dir/${out_file}_multianno_filtered.txt $out_dir/${out_file}_autopvs1_filtered.tsv $out_dir/${out_file}_intervar_filtered.txt
No
Purpose/implementation Section
What feature is being added or bug is being addressed?
Closes #146. This PR removes duplicated columns that are introduced when merging intervar and multianno files.
What was your approach?
modified
01_annotate_variants_*_input.R
to remove redundant columns from intervar file before merging with multianno df.What GitHub issue does your pull request address?
146
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Please test on pbta test files, and confirm output from
01-annotate_variants_CAVATICA_input.R
does not contain duplicated columns:NOTE: to confirm that test_pbta.custom_input.annotations_report.abridged.tsv does not contain duplicated columns, please comment out the last line from
run_autogvp.sh
:rm $autogvp_input $vcf_parsed_file $out_dir/$autogvp_output $out_dir/$out_file.filtered_csq_subfields.tsv $out_dir/${out_file}_multianno_filtered.txt $out_dir/${out_file}_autopvs1_filtered.tsv $out_dir/${out_file}_intervar_filtered.txt
Is there anything that you want to discuss further?
No
Documentation Checklist