diskin-lab-chop / AutoGVP

17 stars 3 forks source link

Rjcorb/247 add clinvar check #248

Closed rjcorb closed 3 months ago

rjcorb commented 4 months ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

Closes #247. This PR adds a check to determine if ReviewStatus values in ClinVar or sample VCF match those expected by AutoGVP, and throws an error otherwise.

What was your approach?

Updated code in 02_annotate_variants*_input.R to check for ReviewStatus values that are currently not used for star assignment by AutoGVP.

What GitHub issue does your pull request address?

247

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Let me know if this error message should be updated in any way.

Is there anything that you want to discuss further?

Documentation Checklist

jharenza commented 3 months ago

I get the following error - should I be running this any other way?

root:/home/rstudio/AutoGVP# bash run_autogvp.sh --workflow="cavatica" \ --vcf=data/test_pbta.single.vqsr.filtered.vep_105.vcf \ --filter_criteria='FORMAT/DP>=10 (FORMAT/AD[0:1-])/(FORMAT/DP)>=0.2 (gnomad_3_1_1_AF_non_cancer<0.001|gnomad_3_1_1_AF_non_cancer=".")' \ --intervar=data/test_pbta.hg38_multianno.txt.intervar \ --multianno=data/test_pbta.hg38_multianno.txt \ --autopvs1=data/test_pbta.autopvs1.tsv \ --conceptIDs=data/clinvar_cpg_concept_ids.txt \ --conflict_res="most_severe" \ --outdir=results \ --out="test_pbta" . select ClinVar submission file not specified. Running select-ClinVar-submissions Rscript... variant summary and/or submission_summary file(s) not specified. Checking if files exist in data/... variant and submission summary files found. Running select-ClinVar-submissions Rscript... resolving ClinVar conflicts with provided concept IDs and specified conflict resolution... Warning message: In left_join(., variant_summary_df, by = "VariationID", multiple = "all", : Detected an unexpected many-to-many relationship between x and y. i Row 21474 of x matches multiple rows in y. i Row 2 of y matches multiple rows in x. i If a many-to-many relationship is expected, set relationship = "many-to-many" to silence this warning. Filtering VCF... vcf file: data/test_pbta.single.vqsr.filtered.vep_105.vcf cmd: bcftools view -f 'PASS,.' data/test_pbta.single.vqsr.filtered.vep_105.vcf | bcftools filter -i 'FORMAT/DP>=10' | bcftools filter -i '(FORMAT/AD[0:1-])/(FORMAT/DP)>=0.2' | bcftools filter -i '(gnomad_3_1_1_AF_non_cancer<0.001|gnomad_3_1_1_AF_non_cancer=".")' > results/test_pbta.filtered.vcf Filtering multianno file... Filtering autopvs1 file... Filtering intervar file... Running AutoGVP... Error: unexpected numeric constant in: " str_detect(INFO, "CLNREVSTAT") ~ "Other" TRUE" Execution halted

rjcorb commented 3 months ago

@jharenza this should run with your provided command now

rebkau commented 3 months ago

Still getting an error with the same command as above

root:/home/rstudio/AutoGVP# bash run_autogvp.sh --workflow="cavatica" --vcf=data/test_pbta.single.vqsr.filtered.vep_105.vcf --filter_criteria='FORMAT/DP>=10 (FORMAT/AD[0:1-])/(FORMAT/DP)>=0.2 (gnomad_3_1_1_AF_non_cancer<0.001|gnomad_3_1_1_AF_non_cancer=".")' --intervar=data/test_pbta.hg38_multianno.txt.intervar --multianno=data/test_pbta.hg38_multianno.txt --autopvs1=data/test_pbta.autopvs1.tsv --conceptIDs=data/clinvar_cpg_concept_ids.txt --conflict_res="most_severe" --outdir=results --out="test_pbta" . select ClinVar submission file not specified. Running select-ClinVar-submissions Rscript... variant summary and/or submission_summary file(s) not specified. Checking if files exist in data/... variant and submission summary files found. Running select-ClinVar-submissions Rscript... resolving ClinVar conflicts with provided concept IDs and specified conflict resolution... Warning message: In left_join(., variant_summary_df, by = "VariationID", multiple = "all", : Detected an unexpected many-to-many relationship between x and y. i Row 21474 of x matches multiple rows in y. i Row 2 of y matches multiple rows in x. i If a many-to-many relationship is expected, set relationship = "many-to-many" to silence this warning. run_autogvp.sh: line 165: 14818 Killed Rscript $BASEDIR/scripts/select-clinVar-submissions.R --variant_summary $variant_summary --submission_summary $submission_summary --outdir $out_dir --conceptID_list $conceptIDs --conflict_res $conflict_res